brain of mat kelcey
item similarity by bipartite graph dispersion
August 20, 2012 at 08:00 PM | categories: graph, similarity | View Comments
the basis of most recommendation systems is the ability to rate similarity between items. there are lots of different ways to do this. one model is based the idea of an interest graph where the nodes of the graph are users and items and the edges of the graph represent an interest, whatever that might mean for the domain. if we only allow edges between users and items the graph is bipartite.let's consider a simple example of 3 users and 3 items; user1 likes item1, user2 likes all three items and user3 likes just item3.fig1user / item interest graphone way...
fuzzy jaccard
July 31, 2012 at 08:00 PM | categories: text, similarity | View Comments
the jaccard coefficient is one of the fundamental measures for doing set similarity. ( recall jaccard(set1, set2) = |intersection| / |union|. when set1 == set2 this evaluates to 1.0 and when set1 and set2 have no intersection it evaluates to 0.0 )one thing that's always annoyed me about it though is that is loses any sense of partial similarity. as a set based measure it's all or nothing.so consider the sets set1 = {i1, i2, i3} and set2 = {i1, i2, i4}jaccard(set1, set2) = 2/4 = 0.5 which is fine given you have no prior info about the relationship between...
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment