e11.0 tweets around the world

was discussing the streaming twitter api with steve and though i knew about the private firehose i didn’t know there was a lighter weight public gardenhose interface!

since discovering this my pvr has basically been running

curl -u mat_kelcey:XXX http://stream.twitter.com/1/statuses/sample.json |\
   gzip -9 - > sample.json.gz

but what am i going to do with all this data?

while poking around i noticed there was a fair number of iPhone: and ÜT: lat long tagged locations (eg iPhone: 35.670086,139.740766) so as a first hack let’s do some work extracing lat longs and displaying them as heat map points on a map.

all the code is on github

as a test then let’s take a sample.bz2 of 1,300 tweets between Oct 14 22:01:41 and 22:03:24

from this let’s just extract the location part of the tweet

bzcat sample.bz2 | ./extract_locations.pl > locations

of these 1,300 there are 30 examples of iphone lat longs (eg iPhone: -23.492420,-46.846916)

cat locations | ./extract_lat_longs_from_locations.rb iphone > locations.iphone

and 36 examples of ut lat longs (eg UT: 51.503212,5.478329)

cat locations | ./extract_lat_longs_from_locations.rb ut > locations.ut

on a side note, does anyone have any idea what ÜT is ? a phone type, maybe a carrier?

we need to convert these lat/longs to x/y points so we can plot onto a map and we’ll use the standard mercator projection to do this

cat locations.{ut,iphone} | ./lat_long_to_merc.rb > x_y_points

for the heat map we want to aggregate into buckets so the pixels are nice and big. finally we’ll output some simple javascript we can cut and paste into some map html

cat x_y_points | ./bucket.rb | sort | uniq -c | ./as_draw_square.rb

the final result is this map !

a good start. next to do the same over a much larger sample using hadoop streaming and pig and then work towards an animation by aggregating on time slices

Tags: , ,

4 Responses to “e11.0 tweets around the world”

  1. Nice article! What’s interesting is that some regions are devoid of location information (like South Africa). Hopefully this will change as GPSes become the norm for cellphones and PDAs.

  2. matpalm says:

    Hopefully with a bigger sample there will be a better representation. Just looking colour schemes for a decent heat map then I’ll do sample over 1e6 instead of just 1e3

  3. How about using “track” keywords to see who’s talking about what? Something like “driving to” or “just arrived”

  4. matpalm says:

    yeah interesting idea. i was going to start with just looking at a user’s location but you should see the crazy stuff people put in there!

Leave a Reply