brain of mat kelcey...
pseudocounts and the good-turing estimation (part1)
April 03, 2011 at 03:04 PM | categories: Uncategorized
say we are running the bar at a soldout bad religion concert. the bar serves beer, scotch and water and we decide to record orders over the night so that we can know how much to order for tomorrow's gig...drink#salesbeer1000scotch300water200using...
visualising the consistent hash
September 26, 2010 at 04:00 PM | categories: Uncategorized
consider the problem of allocating N resources across M servers (N >> M)a common approach is the straight forward modulo hash...if we have 4 servers; servers = [server0, server1, server2, server3] we can allocate a resource to a server by...
simple text search in ruby using ferret
September 12, 2010 at 09:28 PM | categories: Uncategorized
ferret is a lightweight text search engine for ruby, a bit like lucene but with less (ie no) java.i've been looking at it today as part of my named entity extraction prototype which needs to be able to fuzzily match...
my list of cool machine learning books
August 06, 2010 at 06:35 PM | categories: Uncategorized
for the last month or so i've had my head down and have been focusing more on theory (ie reading) than on practice (ie coding)so rather than write no blog post here's mats-list-of-cool-machine-learning-books in the order i think you should...
brutally short intro to weka
July 03, 2010 at 05:35 PM | categories: Uncategorized
weka is a java based machine learning workbench that i've found useful to playing with to help understand some standard machine learning algorithms. in this quick demo i show how to build a classifier for three simple datasets; two of...
friend clustering by term usage
June 25, 2010 at 11:39 PM | categories: Uncategorized
recently signed up to the infochimps api and wanted to do something quick and dirty to get a feel for it.so here's a little experimentget the people i follow on twitterlook up the words that "represent" them according to the...
country codes in world cup tweets - viz1
June 21, 2010 at 07:43 PM | categories: Uncategorized
#worldcup tweet viz1 from Mat Kelcey on Vimeo.here's a simple visualisation of the use of official country codes (eg #aus) in a week's worth of tweets from the search stream for #worldcup.rate is about 2hours of tweets per sec. orb...
moving average of a time series in R
June 15, 2010 at 04:15 PM | categories: Uncategorized
in this a sliding window of 3 elements123456789> x = c(3,1,4,1,5,9,2,6,5,3,5,8)> ra_x = filter(x, rep(1,3)/3)> ra_xTime Series:Start = 1 End = 12 Frequency = 1 [1] NA 2.666667 2.000000 3.333333 5.000000 5.333333 5.666667...
#worldcup twitter analytics
June 14, 2010 at 10:06 PM | categories: Uncategorized
since the world cup started i've spent more time looking at twitter data about the games than the actual games themselves. what a sad data nerd i am!anyways, here's the first few days analysis based the use of official country...
a quick study in tf/icf
June 09, 2010 at 09:58 PM | categories: Uncategorized
while doing some more research on trending algorithms i came across a cool little paper about term frequency normalisation for streaming data: TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams.i'm finding streaming related algorithms quite interesting lately...
5 minute ggobi demo
June 04, 2010 at 11:12 PM | categories: Uncategorized
brutally short demo of ggobi from Mat Kelcey on Vimeo.note: non embedded version has higher res at full screen...
how many terms in a trend?
May 11, 2010 at 07:46 PM | categories: Uncategorized
i've been poking around with a simple trending algorithm over the last few weeks and have uncovered a problem that, like most interesting ones, i'm not sure how to solve. the question revolves around discovering multi terms trends. a sensible...
trending topics in tweets about cheese; part2
May 01, 2010 at 04:54 PM | categories: Uncategorized
prototyping in ruby was a great way to prove the concept but my main motivation for this project was to play some more with pig.the main approach will bemaintain a relation with one record per tokenfold 1 hours worth of...
trending topics in tweets about cheese; part1
April 27, 2010 at 11:42 PM | categories: Uncategorized
what does it mean for a topic to be 'trending'? consider the following time series (430e3 tweets containingcheese collected over a month period bucketed into hourly timeslots)without a formal definition we can just look at this and say that the...
latent semantic analysis via the singular value decomposition (for dummies)
April 19, 2010 at 08:50 PM | categories: Uncategorized
i've been trying to get a deeper understanding of latent semantic analysis for awhile now.last week i came to the conclusion the other way to truly understand would be to start from the ground upso here goes; mat's guide to...
cool bash stuff; mkfifo
April 15, 2010 at 09:33 PM | categories: Uncategorized
mkfifo is one of those shell commands provided as part of coreutils that not many people seem to know about.here's an (semi contrived) example close to something i did the other day to show how awesome it issay you have...
e10.6 community detection for my twitter network
April 04, 2010 at 12:58 PM | categories: Uncategorized
last night i applied my network decomposition algorithm to a graph of some of the people near me in twitter.first i build a friend graph for 100 people 'around' me (taken from a crawl i did last year). by 'friend'...
e10.5 revisiting community detection
March 30, 2010 at 08:42 PM | categories: Uncategorized
i've decided to switch back to some previous work i did on community detection in (social) graphsthe last chunk of code i wrote which tried to deal with weighted directed graphs was terribly, terribly, broken but it seems that simplifying...
brutally short intro to collaborative filtering
March 18, 2010 at 08:38 PM | categories: Uncategorized
my favourite recommendations system is the collaborative filter; it gives good resultsand is easy to understand and extend as required.it works on the intuition thatif i like coffee, chocolate and ice creamand you like coffee and chocolateyou might also like...
sentiment analysis training data using mechanical turk
March 12, 2010 at 09:57 PM | categories: Uncategorized
want to try doing some sentiment analysis work on tweets but i need some good training data.i could label a heap of tweets myself as being positive, neutral or negative but instead this seems to be the perfect job for...
« Previous Page -- Next Page »
popular posts...
FPGA wavenets : eurorack audio processing neural nets running at ~200,000 inferences/sec (oct 2023)
dithernet very slow movie player : a GAN that slowly plays a movie over a year on an eink screen (oct 2020)
evolved channel selection : neural networks robust to any subset of input channels, at any resolution (mar 2021)
ensemble nets : training ensembles as a single model using jax on a tpu pod slice (sept 2020)
bnn : counting bees with a rasp pi (may 2018)
drivebot : learning to do laps with reinforcement learning and neural nets (feb 2016)
wikipedia philosophy : do all first links on wikipedia lead to philosophy? (aug 2011)
some papers from my time at google research / brain...
- Natural Questions: a Benchmark for Question Answering Research
- Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
- WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
my honours thesis
the co-evolution of cooperative behaviour (1997) evolving neural nets with genetic algorithms for communication problems.
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment





