brain of mat kelcey


e11.0 tweets around the world

October 16, 2009 | View Comments

was discussing the streaming twitter api with steve and though i knew about the private firehose i didn't know there was a lighter weight public gardenhose interface!

since discovering this my pvr has basically been running

curl -u mat_kelcey:XXX http://stream.twitter.com/1/statuses/sample.json |\
   gzip -9 - > sample.json.gz
but what am i going to do with all this data?

while poking around i noticed there was a fair number of iPhone: and ÜT: lat long tagged locations (eg iPhone: 35.670086,139.740766) so as a first hack let's do some work extracing lat longs and displaying them as heat map points on a map.

all the code is on github

as a test then let's take a sample.bz2 of 1,300 tweets between Oct 14 22:01:41 and 22:03:24

from this let's just extract the location part of the tweet

bzcat sample.bz2 | ./extract_locations.pl > locations
of these 1,300 there are 30 examples of iphone lat longs (eg iPhone: -23.492420,-46.846916)
cat locations | ./extract_lat_longs_from_locations.rb iphone > locations.iphone
and 36 examples of ut lat longs (eg UT: 51.503212,5.478329)
cat locations | ./extract_lat_longs_from_locations.rb ut > locations.ut
on a side note, does anyone have any idea what ÜT is ? a phone type, maybe a carrier?

we need to convert these lat/longs to x/y points so we can plot onto a map and we'll use the standard mercator projection to do this

cat locations.{ut,iphone} | ./lat_long_to_merc.rb > x_y_points
for the heat map we want to aggregate into buckets so the pixels are nice and big. finally we'll output some simple javascript we can cut and paste into some map html
cat x_y_points | ./bucket.rb | sort | uniq -c | ./as_draw_square.rb
the final result is this map !

a good start. next to do the same over a much larger sample using hadoop streaming and pig and then work towards an animation by aggregating on time slices

blog comments powered by Disqus