since discovering this my pvr has basically been running
curl -u mat_kelcey:XXX http://stream.twitter.com/1/statuses/sample.json |\ gzip -9 - > sample.json.gzbut what am i going to do with all this data?
while poking around i noticed there was a fair number of iPhone: and ÜT: lat long tagged locations (eg iPhone: 35.670086,139.740766) so as a first hack let's do some work extracing lat longs and displaying them as heat map points on a map.
all the code is on github
as a test then let's take a sample.bz2 of 1,300 tweets between Oct 14 22:01:41 and 22:03:24
from this let's just extract the location part of the tweet
bzcat sample.bz2 | ./extract_locations.pl > locationsof these 1,300 there are 30 examples of iphone lat longs (eg iPhone: -23.492420,-46.846916)
cat locations | ./extract_lat_longs_from_locations.rb iphone > locations.iphoneand 36 examples of ut lat longs (eg UT: 51.503212,5.478329)
cat locations | ./extract_lat_longs_from_locations.rb ut > locations.uton a side note, does anyone have any idea what ÜT is ? a phone type, maybe a carrier?
we need to convert these lat/longs to x/y points so we can plot onto a map and we'll use the standard mercator projection to do this
cat x_y_points | ./bucket.rb | sort | uniq -c | ./as_draw_square.rbthe final result is this map !
a good start. next to do the same over a much larger sample using hadoop streaming and pig and then work towards an animation by aggregating on time slices