brain of mat kelcey...
float32 wavenet on a microcontroller at (almost) 50,000 inferences / sec
September 09, 2023 at 09:00 PM | categories: mcu, wavenet, eurorack
wavenet can run at audio rates on an mcu, if you cache carefully, and be used for fun eurorack effects.
high performance ML with JAX
September 12, 2021 at 12:30 PM | categories: jax, talk
did a talk ay pycon on jax. check out the recording!
evolved channel selection
March 01, 2021 at 10:20 PM | categories: projects, ga, jax
rather than use all 13 channels in a multi spectral image for classification can we train a model that is robust to all combos, at all resolutions, and use a genetic algorithm to choose which are the most valuable? (spoiler; yes)
crazy large batch sizes
February 14, 2021 at 10:30 PM | categories: quick_hack, tpu, jax
a quick hack to see how fast we can get a v3-32 pod slice cranking with a global batch size of 170,000; tl-dr pretty fast!
solving y=mx+b... with jax on a tpu pod slice
February 07, 2021 at 01:00 PM | categories: tpu, ensemble_nets, jax, projects, haiku
a 4 (and a bit) part tutorial / colab / screencast series starting with jax fundamentals working up a data parallel approach to running on a cloud tpu pod slice... all focused on solving the toughest problem in machine learning; 1d y=mx+b
develomentor.com podcast interview
December 07, 2020 at 05:00 PM | categories: talk
was a guest on the develomentor podcast talking about random parts of my career
out of distribution detection using focal loss
December 02, 2020 at 01:00 PM | categories: objax, jax, projects
a series of small experiments on using focal loss to do out of distribution detection
my updated list of cool machine learning books
November 01, 2020 at 09:40 PM | categories: Uncategorized
it's been ten years so it's probably time to update my list of cool machine learning books.
dithernet very slow movie player
October 21, 2020 at 10:30 PM | categories: gan, jax, projects, objax
a GAN experiment to generate dithers for an eink screen minimising pixel change between frames for a very slow movie player.
ensemble networks
September 17, 2020 at 06:30 AM | categories: objax, projects, ensemble_nets, jax
ensemble nets; using jax vmap to batch over not just the inputs of a model but also sets of multiple models parameters.
metric learning for image similarity search in objax
September 02, 2020 at 12:00 PM | categories: objax, metric_learning, jax
an objax tutorial on using metric learning for image similarity.
objax on honeysuckle farm
August 30, 2020 at 02:45 PM | categories: talk
i think high level short explainer videos on jax frameworks while doing farm chores is going to be a growing genre.
the map interpretation of attention
August 19, 2020 at 10:30 AM | categories: talk, three_strikes_rule
a talk i did at melbourne ml/ai on how attention mechanism can be interpretated as a form of differentiable map. check out the recording!
self supervised learning and making use of unlabelled data
July 02, 2020 at 05:00 PM | categories: talk
a recording of a talk i did on self supervised learning at yow data.
a jax random embedding ensemble network
June 15, 2020 at 06:30 AM | categories: ensemble_nets, jax
random embedding networks can be used to generate weakly labelled data for metric learning and they see a large benefit from being run in ensembles. can we represent these ensembles as a single forward pass in jax? why yes! yes we can!
keras.io post on metric learning for image similarity search
June 05, 2020 at 12:00 PM | categories: metric_learning, keras
a keras.io tutorial on using metric learning for image similarity.
an illustrative einsum example
May 27, 2020 at 12:00 AM | categories: talk, short_tute
code (and youtube walkthrough) of a port of some numpy code i did recently to einsum that i thought was illustrative.
using cross entropy for metric learning
May 19, 2020 at 06:00 PM | categories: metric_learning, talk
youtube link and slides of a talk i did recently on metric learning at the melbourne ml and ai meetup
measuring baseline random performance for an N way classifier
April 11, 2020 at 12:34 PM | categories: short_tute, three_strikes_rule
quick example code of a simple way to gauge baseline random performance.
deriving class_weights from validation data
March 03, 2020 at 06:00 PM | categories: short_tute, three_strikes_rule
quick example code demoing a way to derive class_weights from performance on validation data. this can often speed training up.
initing the biases in a classifer to closer match training data
February 27, 2020 at 12:00 PM | categories: short_tute, three_strikes_rule
some code showing how you can init the bias of classifier to match the base distribution of your training data.
minimal example of running pybullet under google cloud dataflow
January 29, 2020 at 12:00 AM | categories: short_tute
some code that demos how to run pybullet for generating a truck load of synthetic training under google cloud dataflow.
data engineering concerns for machine learning products
September 26, 2019 at 06:00 PM | categories: talk
slides of a talk i did at the melbourne data engineering meetup.
solving cartpole... by evolving the raw bytes of a 1.4KB tflite microcontroller serialised model
September 13, 2019 at 12:00 AM | categories: projects
evolving a controller for cartpole using an evolutionary algorithm that operates directly on the byte level of a serialised tf lite microcontroller model.
brutally short introduction to learning to learn
August 07, 2019 at 12:00 AM | categories: talk
recording of a talk i did on meta learning at yow data
a half baked pix2pix experiment for road trip videos with teaching forcing
June 26, 2019 at 01:00 PM | categories: gan, projects
a half baked attempt to train a pix2pix model on dash cam videos from a roadtrip around the eastern state of the u.s.
pybullet grasping with time contrastive network embeddings
June 11, 2019 at 01:00 PM | categories: projects
an example of using time contrastive networks to learn embeddings for the pose of a kuka arm in a pybullet simulated grasping environment.
natural questions a benchmark for question answering research
February 01, 2019 at 12:00 AM | categories: paper
the last paper i was involved in at google has been released! congrats to tom and the team.
counting bees on a rasp pi with a conv net
May 17, 2018 at 12:30 PM | categories: projects
training a fully convolutional unet to count bees from a raspberry pi stuck to the side of a hive.
fully convolutional networks
April 06, 2018 at 06:00 PM | categories: short_tute
a short walkthrough explainer on fully convolutional networks.
using simulation and domain adaptation to improve efficiency of deep robotic grasping
September 22, 2017 at 12:00 AM | categories: paper
a paper i helped with at google robotics has been released! congrats to konstantinos and the team!
deep reinforcement learning for robotics
September 21, 2017 at 07:12 PM | categories: talk
a recording of a talk i did at the melbourne ml ai meetup.
simple tensorboard visualisation for gradient norms
June 27, 2017 at 09:45 PM | categories: Uncategorized
some cook book examples of gradient norm visualisation in tensorboard.
after 2,350 days in america we are moving home...
June 14, 2017 at 10:00 PM | categories: Uncategorized
i'm leaving google brain and we're moving back to australia.
cartpole++
August 11, 2016 at 10:00 PM | categories: projects
a pybullet 3d version of cartpole where the pole isn't connected to the cart and you have to learn from pixels.
wikireading a novel large-scale language understanding task over wikipedia
August 11, 2016 at 12:00 AM | categories: paper
our wikireading paper is out! congrats to daniel and the team!check it out at on arxiv... WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia...
learning to do laps with reinforcement learning and neural nets
February 13, 2016 at 10:00 PM | categories: projects
using reinforcement learning to train neural nets for driving a simulated robot around a track.
brutally short intro to theano word embeddings
March 28, 2015 at 01:00 PM | categories: Uncategorized
one thing in theano i couldn't immediately find examples for was a simple embedding lookup table, a critical component for anything with NLP. turns out that it's just one of those things that's so simple no one bothered writing it...
hallucinating softmaxs
March 15, 2015 at 10:00 PM | categories: Uncategorized
language modelling is a classic problem in NLP; given a sequence of words such as "my cat likes to ..." what's the next word? this problem is related to all sorts of things, everything from autocomplete to...
theano and the curse of GpuFromHost
February 22, 2015 at 10:00 PM | categories: Uncategorized
i've been reviving some old theano code recently and in case you haven't seen it theano is a pretty awesome python library that reads a lot like numpy but provides two particularly interesting features.symbolic...
dead simple pymc
December 27, 2012 at 09:00 PM | categories: Uncategorized
PyMC is a python library for working with bayesian statistical models, primarily using MCMC methods. as a software engineer who has only just scratched the surface of statistics this whole...
smoothing low support cases using confidence intervals
December 08, 2012 at 10:50 PM | categories: Uncategorized
say you have three items; item1, item2 and item3 and you've somehow associated a count for each against one of five labels; A, B, C, D, E> data A ...
item similarity by bipartite graph dispersion
August 20, 2012 at 08:00 PM | categories: Uncategorized
the basis of most recommendation systems is the ability to rate similarity between items. there are lots of different ways to do this. one model is based the idea of an interest graph where the nodes of the graph are...
finding names in common crawl
August 18, 2012 at 08:00 PM | categories: Uncategorized
the central offering from common crawl is the raw bytes they've downloaded and, though this is useful for some people, a lot of us just want the visible text of web pages. luckily they've done this extraction as...
fuzzy jaccard
July 31, 2012 at 08:00 PM | categories: Uncategorized
the jaccard coefficient is one of the fundamental measures for doing set similarity. ( recall jaccard(set1, set2) = |intersection| / |union|. when set1 == set2 this evaluates to 1.0 and when set1 and set2 have no intersection it evaluates to...
ggplot posixct cheat sheet
March 18, 2012 at 08:00 PM | categories: Uncategorized
after having to google this stuff three times in the last few months i'm writing it down here so i can just cut and paste next time...> d = read.delim('data.tsv',header=F,as.is=T,col.names=c('dts_str','freq'))> # YEAR MONTH DAY HOUR> head(d,3) ...
collocations in wikipedia, part 1
January 01, 2012 at 08:00 PM | categories: Uncategorized
hmmm. did you mean collocations in wikipedia?...
tokenising the visible english text of common crawl
December 10, 2011 at 04:00 PM | categories: Uncategorized
Common crawl is a publically available 30TB web crawl taken between September 2009 and September 2010. As a small project I decided to extract and tokenised the visible text of the web pages in this dataset. All...
finding phrases with mutual information
November 15, 2011 at 11:00 PM | categories: Uncategorized
continuing on with my series of mutual information experiments how might we extend the technique to identity sequences longer than just two terms?one novel way is to identify the bigrams of interest, replace them with a single...
collocations in wikipedia, part 2
November 05, 2011 at 05:00 PM | categories: Uncategorized
in my last post we went through mutual information as a way of finding collocations.the astute reader may have noticed that for the list of top bigrams i only showed ones that had a frequency above 5,000.why this...
collocations in wikipedia, part 1
October 19, 2011 at 08:00 PM | categories: Uncategorized
collocations are combinations of terms that occur together more frequently than you'd expect by chance. they can include proper noun phrases like 'Darth Vader' stock/colloquial phrases like 'flora...
an exercise in handling mislabelled training data
October 03, 2011 at 08:00 PM | categories: Uncategorized
as part of my diy twitter client project i've been using the twitter sample streams as a source of unlabelled data for some mutual information analysis. these streams are a great source...
do all first links on wikipedia lead to philosophy?
August 13, 2011 at 03:00 PM | categories: projects
using hadoop and a wikipedia dump to test the hypothesis that all first links on pages eventually lead to philosophy.
dimensionality reduction using random projections.
May 10, 2011 at 08:31 PM | categories: Uncategorized
previously i've discussed dimensionality reduction using SVD and PCA but another interesting technique is using a random projection.in a random projection we project A (a NxM matrix) to A' (a NxO, O < M) by the transform AP=A' where P...
pseudocounts and the good-turing estimation (part1)
April 03, 2011 at 03:04 PM | categories: Uncategorized
say we are running the bar at a soldout bad religion concert. the bar serves beer, scotch and water and we decide to record orders over the night so that we can know how much to order for tomorrow's gig...drink#salesbeer1000scotch300water200using...
visualising the consistent hash
September 26, 2010 at 04:00 PM | categories: Uncategorized
consider the problem of allocating N resources across M servers (N >> M)a common approach is the straight forward modulo hash...if we have 4 servers; servers = [server0, server1, server2, server3] we can allocate a resource to a server...
simple text search in ruby using ferret
September 12, 2010 at 09:28 PM | categories: Uncategorized
ferret is a lightweight text search engine for ruby, a bit like lucene but with less (ie no) java.i've been looking at it today as part of my named entity extraction prototype which needs to be able to fuzzily match...
my list of cool machine learning books
August 06, 2010 at 06:35 PM | categories: Uncategorized
for the last month or so i've had my head down and have been focusing more on theory (ie reading) than on practice (ie coding)so rather than write no blog post here's mats-list-of-cool-machine-learning-books in the order i think you should...
brutally short intro to weka
July 03, 2010 at 05:35 PM | categories: Uncategorized
weka is a java based machine learning workbench that i've found useful to playing with to help understand some standard machine learning algorithms. in this quick demo i show how to build a classifier for three simple datasets; two of...
friend clustering by term usage
June 25, 2010 at 11:39 PM | categories: Uncategorized
recently signed up to the infochimps api and wanted to do something quick and dirty to get a feel for it.so here's a little experiment get the people i follow on twitterlook up the words that "represent" them...
country codes in world cup tweets - viz1
June 21, 2010 at 07:43 PM | categories: Uncategorized
#worldcup tweet viz1 from Mat Kelcey on Vimeo.here's a simple visualisation of the use of official country codes (eg #aus) in a week's worth of tweets from the search stream for #worldcup.rate is about 2hours of tweets per sec. orb...
moving average of a time series in R
June 15, 2010 at 04:15 PM | categories: Uncategorized
in this a sliding window of 3 elements123456789> x = c(3,1,4,1,5,9,2,6,5,3,5,8)> ra_x = filter(x, rep(1,3)/3)> ra_xTime Series:Start = 1 End = 12 Frequency = 1 [1] NA 2.666667 2.000000 3.333333 5.000000 5.333333 5.666667...
#worldcup twitter analytics
June 14, 2010 at 10:06 PM | categories: Uncategorized
since the world cup started i've spent more time looking at twitter data about the games than the actual games themselves. what a sad data nerd i am!anyways, here's the first few days analysis based the use of official country...
a quick study in tf/icf
June 09, 2010 at 09:58 PM | categories: Uncategorized
while doing some more research on trending algorithms i came across a cool little paper about term frequency normalisation for streaming data: TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams.i'm finding streaming related algorithms quite interesting lately...
5 minute ggobi demo
June 04, 2010 at 11:12 PM | categories: Uncategorized
brutally short demo of ggobi from Mat Kelcey on Vimeo.note: non embedded version has higher res at full screen...
how many terms in a trend?
May 11, 2010 at 07:46 PM | categories: Uncategorized
i've been poking around with a simple trending algorithm over the last few weeks and have uncovered a problem that, like most interesting ones, i'm not sure how to solve. the question revolves around discovering multi terms trends. a sensible...
trending topics in tweets about cheese; part2
May 01, 2010 at 04:54 PM | categories: Uncategorized
prototyping in ruby was a great way to prove the concept but my main motivation for this project was to play some more with pig.the main approach will be maintain a relation with one record per...
trending topics in tweets about cheese; part1
April 27, 2010 at 11:42 PM | categories: Uncategorized
what does it mean for a topic to be 'trending'? consider the following time series (430e3 tweets containing cheese collected over a month period bucketed into hourly timeslots)without a formal definition we can just look at this and...
latent semantic analysis via the singular value decomposition (for dummies)
April 19, 2010 at 08:50 PM | categories: Uncategorized
i've been trying to get a deeper understanding of latent semantic analysis for awhile now. last week i came to the conclusion the other way to truly understand would be to start from the ground up ...
cool bash stuff; mkfifo
April 15, 2010 at 09:33 PM | categories: Uncategorized
mkfifo is one of those shell commands provided as part of coreutils that not many people seem to know about.here's an (semi contrived) example close to something i did the other day to show how awesome it issay you have...
e10.6 community detection for my twitter network
April 04, 2010 at 12:58 PM | categories: Uncategorized
last night i applied my network decomposition algorithm to a graph of some of the people near me in twitter.first i build a friend graph for 100 people 'around' me (taken from a crawl i did last year). by 'friend'...
e10.5 revisiting community detection
March 30, 2010 at 08:42 PM | categories: Uncategorized
i've decided to switch back to some previous work i did on community detection in (social) graphsthe last chunk of code i wrote which tried to deal with weighted directed graphs was terribly, terribly, broken but it seems that simplifying...
brutally short intro to collaborative filtering
March 18, 2010 at 08:38 PM | categories: Uncategorized
my favourite recommendations system is the collaborative filter; it gives good results and is easy to understand and extend as required.it works on the intuition that if i like coffee, chocolate and ice cream ...
sentiment analysis training data using mechanical turk
March 12, 2010 at 09:57 PM | categories: Uncategorized
want to try doing some sentiment analysis work on tweets but i need some good training data.i could label a heap of tweets myself as being positive, neutral or negative but instead this seems to be the perfect job for...
mongodb + twitter + yahoo term extractor = fun!
March 07, 2010 at 09:38 PM | categories: Uncategorized
ran a little experiment in using yahoo term extraction yesterday and it worked well enough. here's some code to pass some text to yahoo and get back an array of termsi've got to say mongodb is such an easy tool...
what to do with a week off?
February 22, 2010 at 06:42 PM | categories: Uncategorized
this week i'm between jobs so i have (a little) more time than usual to hack.i've got a list of pending things to do but can't decide what to do next, here's my list in (sort of) priority order... ...
semi supervised naive bayes for text classification
February 14, 2010 at 09:46 PM | categories: Uncategorized
experiment 13; a test of semi supervised naive bayes for text classification is complete.semi supervised algorithms seem to work pretty well and i can see how they are a huge benefit for text classification where you can have an enormous...
e12.3 stat syns FAIL!
February 05, 2010 at 08:31 PM | categories: Uncategorized
after quite a bit of hacking the statistical synonyms idea doesn't seem to give terribly interesting results so i'm going onto do something else.for the record here's what I did do though.... generate 3grams from 800e3 tweetscollect n-grams...
an intro to semi supervised document classification
January 31, 2010 at 02:02 PM | categories: Uncategorized
here's a great lecture from tom mitchell about document classification using a semi supervised version of naive bayes.semi supervised algorithms only require some of the training examples to be labeled and are able to make use of any unlabelled ones,...
e12.2 entity set expansion
January 28, 2010 at 08:18 PM | categories: Uncategorized
i've been doing some reading for my statistical synonyms project and have uncovered a heap of cool papers. most of them are around an idea (from the 1950's!) called the distributional hypothesis that simply states that words that appear in...
e12.1 statistical synonyms
January 23, 2010 at 12:54 PM | categories: Uncategorized
i've had an idea brewing in my head for awhile now seeded by a great talk by peter norvig about statistically approaches to find patterns in data.one thing he alludes to is the generation of synoyms based on n-gram models.the...
a pig screencast
January 17, 2010 at 02:22 PM | categories: Uncategorized
pig demo from Mat Kelcey on Vimeo.based on a talk i gave at work recently...
tweets about cheese
November 15, 2009 at 08:45 PM | categories: Uncategorized
people tweet about all sorts of stuff.sometimes it's really important ground breaking world changing stuff... but most of the time it's ridiculous waste of time stuff like 'i ate some cheese'in fact how much do people actually tweet...
xargs parallel execution
November 06, 2009 at 09:57 PM | categories: Uncategorized
just recently discovered xargs has a parallelise option!i have 20 files, sample.01.gz to sample.20.gz, each ~100mb in size that i need to run a script overone option is zcat sample*gz | ./script.rb > output but this...
e11.3 at what time does the world tweet?
October 28, 2009 at 09:22 PM | categories: Uncategorized
consider the graph below which shows the proportion of tweets per 10 min slot of the day (GMT0)it compares 4.7e6 tweets with any location vs 320e3 tweets with identifiable lat lons some interesting observations with unanswered questions... ...
e11.2 aggregating tweets by time of day
October 24, 2009 at 01:02 PM | categories: Uncategorized
for v3 lets aggregate by time of the day, should make for an interesting animationbrowsing the data there are lots of other lat longs in data, not just iPhone: and ÜT: there are also one tagged with Coppó:, Pre:, etc...
e11.1 from bash scripts to hadoop
October 18, 2009 at 02:10 PM | categories: Uncategorized
let's rewrite v1 using hadoop tooling, code is on githubwe'll run hadoop in non distributed standalone mode. in this mode everything runs in a single jvm so it's nice and simple to dev against.in v1 it wasbzcat sample.bz2 | ./extract_locations.pl...
e11.0 tweets around the world
October 16, 2009 at 08:47 PM | categories: Uncategorized
was discussing the streaming twitter api with steve and though i knew about the private firehose i didn't know there was a lighter weight public gardenhose interface!since discovering this my pvr has basically been running curl -u mat_kelcey:XXX...
e10.4 communities in social graphs
October 06, 2009 at 08:05 PM | categories: Uncategorized
social graphs, like twitter or facebook, often follow the pattern of having clusters of highly connected components with an occasional edge joining these clusters.these connecting edges define the boundaries of communities in the social network and can be identified by...
simple statistics with R
October 03, 2009 at 03:43 PM | categories: Uncategorized
i'm learning a new statistics language called R and it's pretty cool.make a vector ...12> c(3,1,4,1,5,9,2,6,5,3,5,8) [1] 3 1 4 1 5 9 2 6 5 3 5 8turn it into a frequency table ...123> table(c(3,1,4,1,5,9,2,6,5,3,5,8))1 2 3 4 5...
do a degree via youtube
October 01, 2009 at 08:40 PM | categories: Uncategorized
i'm amazed by how much great content is on youtube, how could you NOT learn something!?13 x 1hr Statistical Aspects of Data Mining (Stats 202)20 x 1hr Machine Learning...
e10.3 twitter crawl progress
September 29, 2009 at 08:43 PM | categories: Uncategorized
since the twitter api is rate limited it's quite slow to crawl twitter and after a most of a week i've still only managed to get info on 8,000 users. i probably should subscribe to get a 20,000 an hr...
e10.2 tgraph crawl order example
September 21, 2009 at 09:58 PM | categories: Uncategorized
let's consider an example of the crawl order for tgraph...we seed our frontier with 'a' and bootstrap cost of 0.fetching the info for 'a' shows 2 outedges to 'b' and 'c', from our cost formula these all have cost 0...
e10.1 crawling twitter
September 19, 2009 at 09:31 PM | categories: Uncategorized
our first goal is to get some data and the twitter api makes getting the data trivial. i'm focused mainly on the friends stuff but because it only gives user ids i'll also get the user info so i can...
e10.0 introducing tgraph
September 19, 2009 at 02:41 PM | categories: Uncategorized
so e9 sip is on hold for a bit while i kick off e10 tgraph. was looking for another problem to try hadoop with and came across a classic graph one, pagerank. a well understood algorithm like page rank will...
first hadoop experiment
September 16, 2009 at 07:26 PM | categories: Uncategorized
just finished my first hadoop experiment.matpalm.com/sipnot fantastic results but heaps of of feedback from hadoop mailing groupmore results coming soon...
how using compressed data can make you app faster
June 28, 2009 at 11:32 AM | categories: Uncategorized
when working with larger data sets (ie more than can fit in memory) there are two important resources to juggle… cpu. how quickly can you process the data. disk io. how...
erlang profiling
April 22, 2009 at 11:32 AM | categories: Uncategorized
i just found fprof, the erlang profiler by randoming clicking around the erlang man page listtry123fprof:apply(Module, Function, Args).fprof:profile().fprof:analyse().for an interesting breakdown of a call...
bin packing
December 14, 2008 at 11:31 AM | categories: Uncategorized
how to decide what next to backup onto a dvd?when is brute force good enough? will a random walk get a good enough result faster?matpalm.com/burn.it...
Next Page »
popular posts...
ensemble nets : training ensembles as a single model using
jax on a tpu pod slice(sept 2020)
bnn : counting bees with a rasp pi (may 2018)

drivebot : learning to do laps with reinforcement learning and neural nets (feb 2016)

wikipedia philosophy : do all first links on wikipedia lead to philosophy? (aug 2011)

cartpole++ : deep RL hacking with a complex 3d cart pole environment (aug 2016)

malmomo : deep RL hacking on minecraft with malmo (jan 2017)

some papers from my time at google research / brain...
- Natural Questions: a Benchmark for Question Answering Research
- Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping
- WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
my honours thesis
the co-evolution of cooperative behaviour (1997) evolving neural nets with genetic algorithms for communication problems.
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment