brain of mat kelcey
how many terms in a trend?
May 11, 2010 at 07:46 PM | categories: e15, trending, algorithms, puzzled | View Comments
i've been poking around with a simple trending algorithm over the last few weeks and have uncovered a problem that, like most interesting ones, i'm not sure how to solve. the question revolves around discovering multi terms trends. a sensible place to start when looking for trends is to consider single terms but what if though we ended up with three equally trending terms 'happy', 'new' and 'year'? it's pretty obvious that the actual trend is 'happy new year' but what is the best way to express this as a single trend in an algorithmic sense?one approach i've been playing...
trending topics in tweets about cheese; part2
May 01, 2010 at 04:54 PM | categories: twitter, trending, e15, pig | View Comments
prototyping in ruby was a great way to prove the concept but my main motivation for this project was to play some more with pig.the main approach will bemaintain a relation with one record per tokenfold 1 hours worth of new data at a time into the modelcheck the entries for the latest hour for any trendsthe full version is on github. read on for a line by line walkthrough!the ruby impl used the simplest approach possible for calculating mean and standard deviation; just keep a record of all the values seen so far and recalculate for each new value.for...
trending topics in tweets about cheese; part1
April 27, 2010 at 11:42 PM | categories: cheese, twitter, trending, e15 | View Comments
what does it mean for a topic to be 'trending'? consider the following time series (430e3 tweets containing cheese collected over a month period bucketed into hourly timeslots)without a formal definition we can just look at this and say that the series was trending where there was a spike just before time 600. as a start then let's just define a trend as a value that was greater than was 'expected'.one really nice simple algorithm for detecting a trend is to say a value, v, is trending if v > mean + 3 * standard deviation of the data seen...
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment