my nerd blog

e14: latent semantic analysis via the singular value decomposition (for dummies)

  1. introduction
  2. example 1: two clear features
  3. example 2: two less clear features
  4. example 3: two less clear features (revisited)
  5. using real data
  6. using real data (with document length normalisation)
  7. conclusions

e13: semi supervised naive bayes

  1. what is a semi supervised algorithm?
  2. v1: a semi supervised version of naive bayes
  3. does it do any better?
  4. v2: rewriting for scale
  5. v2: does it do any better?

e12: statistical synonyms

  1. what are statistical synonyms?

e11: round the world tweets

  1. tweets around the world
  2. from bash scripts to hadoop
  3. aggregating tweets by time of day
  4. at what time does the world tweet?

e10: decomposing social graphs on twitter

  1. introducing tgraph
  2. crawling twitter
  3. tgraph crawl order example
  4. twitter crawl progress
  5. communities in social graphs
  6. community detection for my local twitter network

e9: do it yourself statistically improbable phrases

  1. introduction
  2. take 1: trigram frequency
  3. take 2: term frequency
  4. take 3: markov chains
  5. part 4: but does it scale?

e7: should i burn it?

  1. the problem
  2. brute force
  3. random walk

e6: the median of a trillion numbers

  1. the question
  2. the base algorithm
  3. distributing
  4. generating test data
  5. ruby implementation
  6. brutally short introduction to erlang
  7. single process erlang implementation
  8. multiple process erlang implementation
  9. performance comparisons
  10. running on amazon ec2
  11. conclusion

e5: deduping with resemblance metrics

  1. the jaccard coefficient
  2. fastmap projection using jaccard distances
  3. simhash and sketching

e4: rss feed stuff / should i read it?

  1. the m-algorithm
  2. considering word occurrences
  3. the naive bayes method
  4. multinomial bayes
  5. markov chains

e3: audioscrobbler experiments

  1. bands like the bands i like
  2. the distance from celion dion to napalm death
  3. multi dimensional tag distance

e2: chaoscope experiment

e1: particle swarm optimisation experiment

ye olde pics

  • 2008 03 espen
  • 2006 06 around the world (east to west)
  • 2004 12 wedding
  • 2004 09 skydiving
  • 2004 01 the northern territory
  • 2003 01 kyoto
  • 2002 12 pool party
  • 2002 07 old arcade games
  • 2002 04 johnny walker visits lambert ave
  • 2002 02 three uni students in adelaide
  • 2001 12 new kitten
  • 2001 12 around the world (west to east)
  • 2001 09 top secret rocket experiments
  • 2000 pre perth poker phase page
  • 1999 tokyo
  • 1996 usa
  • other

  • oh the places you'll go
  • my nerd blog
  • me on twitter
  • me on github
  • me on youtube
  • me on facebook
  • send me an email! matthew.kelcey at gmail dot com