brain of mat kelcey
first hadoop experiment
September 16, 2009 at 07:26 PM | categories: ec2, big data, hadoop | View Comments
just finished my first hadoop experiment.matpalm.com/sipnot fantastic results but heaps of of feedback from hadoop mailing groupmore results coming soon...
the median of a trillion numbers
November 15, 2008 at 11:31 AM | categories: erlang, algorithms, ec2 | View Comments
i got asked in an interview once “how would find the median of a trillion numbers across a thousand machines?”the question has haunted me, until now.here’s my ruby and erlang implementation with a bit of running amazon ec2 thrown in for good measure….. matpalm.com/median/grab the code from github...
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment