me on twitter

brain of mat kelcey


e11.1 from bash scripts to hadoop

October 18, 2009 at 02:10 PM | categories: e11, maps, twitter, hadoop, pig | View Comments

let's rewrite v1 using hadoop tooling, code is on githubwe'll run hadoop in non distributed standalone mode. in this mode everything runs in a single jvm so it's nice and simple to dev against.in v1 it wasbzcat sample.bz2 | ./extract_locations.pl > locationsusing the the awesome hadoop streaming interface it's not too different. this interface allows you to specify any app as the mapper or reducer. the main difference is that it works on directories not just files.for the mapper we'll use exactly the same script as before; extract_locations.pl and since there is no reduce component of this job so we...
Read and Post Comments

e11.0 tweets around the world

October 16, 2009 at 08:47 PM | categories: e11, maps, twitter | View Comments

was discussing the streaming twitter api with steve and though i knew about the private firehose i didn't know there was a lighter weight public gardenhose interface!since discovering this my pvr has basically been runningcurl -u mat_kelcey:XXX http://stream.twitter.com/1/statuses/sample.json |\  gzip -9 - > sample.json.gzbut what am i going to do with all this data?while poking around i noticed there was a fair number of iPhone: and ÜT: lat long tagged locations (eg iPhone: 35.670086,139.740766) so as a first hack let's do some work extracing lat longs and displaying them as heat map points on a map.all the code is...
Read and Post Comments

old projects...