fail, e12 | View Comments
after quite a bit of hacking the statistical synonyms idea doesn't seem to give terribly interesting results so i'm going onto do something else.for the record here's what I did do though....generate 3grams from 800e3 tweetscollect n-grams together that share the same first and last term; eg 'the blue cat', 'the green cat', 'the red cat'for each set generate all the combos of the middle terms; eg 'blue green', 'blue red', 'green red'count the occurrences of each pairdraw a graph of the 150 top occurring pairsviola! click this image for a bigger versionsome interesting result. few of the more complex...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment