萊恩Ryan's 生活筆記簿: [Python] Facebook page 的 Like 與 Crawling

網上找Facebook上做Web crawling 的方法，記得之前試過某Coursera教Twitter上的Crawling，用到服務本身提供的API和Authentication Token。Twitter/Google map的教學的都有Coursera上的課堂，但FB提供的只有網上找到零碎的教學，當時是一個外國Vlogger like Emma Watson posts的Script。試明白後把這段code貯起來做參考，想不到這星期剛好有用得上的時候。用到以下的第三方SDK，安裝後用 import facebook 載入。

Python 的facebook SDK: https://github.com/pythonforfacebook/facebook-sdk

更多的Facebook SDK: https://developers.facebook.com/docs/apis-and-sdks

了解背後的FB Graph API的用法: https://developers.facebook.com/docs/graph-api/overview

找取 & like page 的post:

- 'toke here' 的token 可以在GRAPH Explorer中得到, (Keep it secret!!)

- "ejmonthly" 是信報財經月刊的Facebook name, 看網址就會找到了。

對於ejmonthly 這個profile, 當用get_connections連繫上它的posts資料, 就可以對它的資料內容

資料做事了。抓取內容時，其本上是一層層的dictionary 結構。做動作時就用到sdk內的function, 如put_object() 去對某個post 給like。

import facebook
import urllib2
import json

accessToken = 'toke here'
graph = facebook.GraphAPI(accessToken)

# Grab user's post and *like* them
profile = graph.get_object("ejmonthly")
#print profile
#print json.dumps(data, indent=4)
posts = graph.get_connections(profile['id'], "posts")

count = 0
for post in posts['data']:
    try:
        graph.put_object(post['id'],"likes") #Action here - Like Post
        print "Like the topic: #", count
        print "Creation time: ", post['created_time']
        print post['message'][:50] + "...\n"
        count = count + 1
        if count > 10: #Option - only the top 10 posts
            break
    except:
        continue

Facebook資料以一個的"Social Graph"的結構概念存在，簡介中是這樣形容:

nodes - basically "things" such as a User, a Photo, a Page, a Comment.

edges - the connections between those "things", such as a Page's Photos, or a Photo's Comments

fields - info about those "things", such as a person's birthday, or the name of a Page

(GET) 有點nodes感覺的，例如：

profile = graph.get_object("me")

(GET) 像是edges的，例如：
friends = graph.get_connections("me", "friends")
feed = graph.get_connections("me", "feed")

(GET) fields 是上面傳回的用dictionary載著的內容，例如：
post = feed["data"][0]
對於某個Nodes未知有什麼內容可以抓取的話，最好先用print看一下。
print profile 或者 print json.dumps(data, indent=4)

(POST) 雖然有put_comment(), put_like()，但graph.put_object() 比較萬用：
graph.put_object(post['id'],"likes")
graph.put_object("me", "feed", message="Hello, world")
graph.put_object(post["id"], "comments", message="First!")

Facebook 提供的是Facebook Graph 2.5 版本，自從2015年10月起，除非對方都有用Graph, 有授權過Graph 這個"App" (一般人都不會....)，否則都連不上朋友的資料了。這樣情況下，好像對開Page的人/公司去了解自己的Page時才比較有用。而且有Limit(Paging: next)的限制還未懂得用，抓不到25筆以上的資料...

後話....自學的感覺跟香港以外的技術真的差很遠，想一星期打一篇週記，下起手時都是超慢。昨天聽到一句話「我們都是這樣浸大的」，繼續努力，希望慢慢浸出我想要的水準吧...

萊恩Ryan's 生活筆記簿

Category

2016年1月15日星期五

[Python] Facebook page 的 Like 與 Crawling

沒有留言:

張貼留言

Category

2016年1月15日 星期五

[Python] Facebook page 的 Like 與 Crawling

沒有留言:

張貼留言

2016年1月15日星期五