brain of mat kelcey
e12.3 stat syns FAIL!
February 05, 2010 at 08:31 PM | categories: fail, e12 | View Comments
    
  after quite a bit of hacking the statistical synonyms idea doesn't seem to give terribly interesting results so i'm going onto do something else.for the record here's what I did do though....generate 3grams from 800e3 tweetscollect n-grams together that share the same first and last term; eg 'the blue cat', 'the green cat', 'the red cat'for each set generate all the combos of the middle terms; eg 'blue green', 'blue red', 'green red'count the occurrences of each pairdraw a graph of the 150 top occurring pairsviola! click this image for a bigger versionsome interesting result. few of the more complex...
  
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
 - semi supervised naive bayes
 - statistical synonyms
 - round the world tweets
 - decomposing social graphs on twitter
 - do it yourself statistically improbable phrases
 - should i burn it?
 - the median of a trillion numbers
 - deduping with resemblance metrics
 - simple supervised learning / should i read it?
 - audioscrobbler experiments
 - chaoscope experiment