e12.3 stat syns FAIL!

February 05, 2010

after quite a bit of hacking the statistical synonyms idea doesn't seem to give terribly interesting results so i'm going onto do something else.for the record here's what I did do though....generate 3grams from 800e3 tweetscollect n-grams together that share the same first and last term; eg 'the blue cat', 'the green cat', 'the red cat'for each set generate all the combos of the middle terms; eg 'blue green', 'blue red', 'green red'count the occurrences of each pairdraw a graph of the 150 top occurring pairsviola! click this image for a bigger versionsome interesting result. few of the more complex...
