Data Mining Facebook

I was trying to figure out how hard it would be to tap into the Facebook APIs to do some data mining on social data. Doing this couldn’t be easier, you don’t even have to register an application with Facebook. All you need is your web browser and an environment where you can consume JSON objects.

The key to this trick is to use Facebook’s public Graph API Explorer. The first step is to log in to your Facebook account, go to the Graph API Explorer and then click the ‘Get Access Token’ button (Make sure to select what various permissions you’d like to associate with the token.) The token is good for a limited amount of time, but it’s good enough to run queries against the API and should be valid as long as you are logged in.

As you can see on the page, you are able to perform test queries against the API. For example to get a list of all my friends I simply issue the query

https://graph.facebook.com/MY_FB_ID/friends

and I receive a JSON object containing a list of all my friends.

In order to query the API from outside of the explorer tool (say for example in Python or using curl) you have to provide the Access Token that you obtained earlier. For example,

https://graph.facebook.com/MY_FB_ID/friends?access_token=ACCESS_TOKEN

You can read JSON data using the simplejson and urllib2 libraries in python. Here’s a script that makes use of these libraries in order to print my ‘Likes’ list. Notice that you have to paste in your accessToken and your userId into the code,

# First go to http://developers.facebook.com/tools/explorer
# make sure to get an access token and paste it in below
# This code prints items that I've liked.

from urllib2 import urlopen
from simplejson import loads

accessToken = '#####'  #INSERT YOUR ACCESS TOKEN
userId = '#####'           #INSERT YOUR USER ID

# Read my likes as a json object
url='https://graph.facebook.com/' + userId + '/likes?access_token=' + accessToken
jsonContent = loads(urlopen(url).read())

# Data looks like
#{'data': [{'category': 'Website',
#           'created_time': '2011-06-27T06:45:35+0000',
#           'id': '218822511478396',
#           'name': 'Only in El Paso'},
#         {'category': 'Tv show',
#          'created_time': '2011-06-13T03:24:31+0000',
#          'id': '229113353772249',
#          'name': 'My Cat From Hell'},
# ...

for item in jsonContent['data']:
	print item['name']

I used a similar script to help me visualize the distribution of all my friend’s Likes (It follows a Zipf’s Law distribution/Power-Law distribution, with the most liked items being President Obama followed by the True Blood TV Show! ) Check out the Graph API documentation and you’ll see that you can issue bulk commands to the Graph API if you’d like. Happy data mining.

Leave a Reply

Your email address will not be published. Required fields are marked *