brain of mat kelcey...
e11.3 at what time does the world tweet?
October 28, 2009 at 09:22 PM | categories: Uncategorized
consider the graph below which shows the proportion of tweets per 10 min slot of the day (GMT0)it compares 4.7e6 tweets with any location vs 320e3 tweets with identifiable lat lonssome interesting observations with unanswered questions...the ebb and flow is...
e11.2 aggregating tweets by time of day
October 24, 2009 at 01:02 PM | categories: Uncategorized
for v3 lets aggregate by time of the day, should make for an interesting animationbrowsing the data there are lots of other lat longs in data, not just iPhone: and ÜT: there are also one tagged with Coppó:, Pre:, etc...
e11.1 from bash scripts to hadoop
October 18, 2009 at 02:10 PM | categories: Uncategorized
let's rewrite v1 using hadoop tooling, code is on githubwe'll run hadoop in non distributed standalone mode. in this mode everything runs in a single jvm so it's nice and simple to dev against.in v1 it wasbzcat sample.bz2 | ./extract_locations.pl...
e11.0 tweets around the world
October 16, 2009 at 08:47 PM | categories: Uncategorized
was discussing the streaming twitter api with steve and though i knew about the private firehose i didn't know there was a lighter weight public gardenhose interface!since discovering this my pvr has basically been runningcurl -u mat_kelcey:XXX http://stream.twitter.com/1/statuses/sample.json |\ ...
e10.4 communities in social graphs
October 06, 2009 at 08:05 PM | categories: Uncategorized
social graphs, like twitter or facebook, often follow the pattern of having clusters of highly connected components with an occasional edge joining these clusters.these connecting edges define the boundaries of communities in the social network and can be identified by...
simple statistics with R
October 03, 2009 at 03:43 PM | categories: Uncategorized
i'm learning a new statistics language called R and it's pretty cool.make a vector ...12> c(3,1,4,1,5,9,2,6,5,3,5,8) [1] 3 1 4 1 5 9 2 6 5 3 5 8turn it into a frequency table ...123> table(c(3,1,4,1,5,9,2,6,5,3,5,8))1 2 3 4 5...
do a degree via youtube
October 01, 2009 at 08:40 PM | categories: Uncategorized
i'm amazed by how much great content is on youtube, how could you NOT learn something!?13 x 1hr Statistical Aspects of Data Mining (Stats 202)20 x 1hr Machine Learning...
e10.3 twitter crawl progress
September 29, 2009 at 08:43 PM | categories: Uncategorized
since the twitter api is rate limited it's quite slow to crawl twitter and after a most of a week i've still only managed to get info on 8,000 users. i probably should subscribe to get a 20,000 an hr...
e10.2 tgraph crawl order example
September 21, 2009 at 09:58 PM | categories: Uncategorized
let's consider an example of the crawl order for tgraph...we seed our frontier with 'a' and bootstrap cost of 0.fetching the info for 'a' shows 2 outedges to 'b' and 'c', from our cost formula these all have cost 0...
e10.1 crawling twitter
September 19, 2009 at 09:31 PM | categories: Uncategorized
our first goal is to get some data and the twitter api makes getting the data trivial. i'm focused mainly on the friends stuff but because it only gives user ids i'll also get the user info so i can...
e10.0 introducing tgraph
September 19, 2009 at 02:41 PM | categories: Uncategorized
so e9 sip is on hold for a bit while i kick off e10 tgraph. was looking for another problem to try hadoop with and came across a classic graph one, pagerank. a well understood algorithm like page rank will...
first hadoop experiment
September 16, 2009 at 07:26 PM | categories: Uncategorized
just finished my first hadoop experiment.matpalm.com/sipnot fantastic results but heaps of of feedback from hadoop mailing groupmore results coming soon...
how using compressed data can make you app faster
June 28, 2009 at 11:32 AM | categories: Uncategorized
when working with larger data sets (ie more than can fit in memory) there are two important resources to juggle…cpu. how quickly can you process the data.disk io. how quickly can you get data to the cpu.i remember reading once...
erlang profiling
April 22, 2009 at 11:32 AM | categories: Uncategorized
i just found fprof, the erlang profiler by randoming clicking around the erlang man page listtry123fprof:apply(Module, Function, Args).fprof:profile().fprof:analyse().for an interesting breakdown of a call...
bin packing
December 14, 2008 at 11:31 AM | categories: Uncategorized
how to decide what next to backup onto a dvd?when is brute force good enough? will a random walk get a good enough result faster?matpalm.com/burn.it...
the median of a trillion numbers
November 15, 2008 at 11:31 AM | categories: Uncategorized
i got asked in an interview once “how would find the median of a trillion numbers across a thousand machines?”the question has haunted me, until now.here’s my ruby and erlang implementation with a bit of running amazon ec2 thrown in...
fastmap and the jaccard distance
October 31, 2008 at 11:31 AM | categories: Uncategorized
given a set of pairwise distances how do you determine what points correspond to those distances?my latest experiment considers this problem in relation to jaccard distances, a resemblance measure similar to jaccard coefficients used in a previous experimentby using the...
openmp = easy multi threading
October 13, 2008 at 11:30 AM | categories: Uncategorized
openmp is a compiler library, available in gcc since v4.2, for giving hints to a compiler about where code can be parallelized.say we have some code12for(int i=0; i<HUGE_NUMBER; ++i) deadHardCalculation(i)we can make this run on multi threaded by simply...
shingling and the jaccard index
October 06, 2008 at 11:30 AM | categories: Uncategorized
on the weekend i did another experiment using shingling and the jaccard index to try to determine if two sets of data were “duplicates”it works quite well and includes a ruby and c++ version with low level bit operations.project page...
« Previous Page
popular posts...
ensemble nets : training ensembles as a single model using jax on a tpu pod slice(sept 2020)
bnn : counting bees with a rasp pi (may 2018)
drivebot : learning to do laps with reinforcement learning and neural nets (feb 2016)
wikipedia philosophy : do all first links on wikipedia lead to philosophy? (aug 2011)
cartpole++ : deep RL hacking with a complex 3d cart pole environment (aug 2016)
malmomo : deep RL hacking on minecraft with malmo (jan 2017)
some papers from my time at google research / brain...
- Natural Questions: a Benchmark for Question Answering Research
- Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
- WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
my honours thesis
the co-evolution of cooperative behaviour (1997) evolving neural nets with genetic algorithms for communication problems.
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment