e11, maps, twitter, hadoop, pig | View Comments
let's rewrite v1 using hadoop tooling, code is on githubwe'll run hadoop in non distributed standalone mode. in this mode everything runs in a single jvm so it's nice and simple to dev against.in v1 it wasbzcat sample.bz2 | ./extract_locations.pl > locationsusing the the awesome hadoop streaming interface it's not too different. this interface allows you to specify any app as the mapper or reducer. the main difference is that it works on directories not just files.for the mapper we'll use exactly the same script as before; extract_locations.pl and since there is no reduce component of this job so we...
e11, maps, twitter | View Comments
was discussing the streaming twitter api with steve and though i knew about the private firehose i didn't know there was a lighter weight public gardenhose interface!since discovering this my pvr has basically been runningcurl -u mat_kelcey:XXX http://stream.twitter.com/1/statuses/sample.json |\ gzip -9 - > sample.json.gzbut what am i going to do with all this data?while poking around i noticed there was a fair number of iPhone: and ÜT: lat long tagged locations (eg iPhone: 35.670086,139.740766) so as a first hack let's do some work extracing lat longs and displaying them as heat map points on a map.all the code is...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment