me on twitter

brain of mat kelcey


item similarity by bipartite graph dispersion

August 20, 2012 at 08:00 PM | categories: graph, similarity | View Comments

the basis of most recommendation systems is the ability to rate similarity between items. there are lots of different ways to do this. one model is based the idea of an interest graph where the nodes of the graph are users and items and the edges of the graph represent an interest, whatever that might mean for the domain. if we only allow edges between users and items the graph is bipartite.let's consider a simple example of 3 users and 3 items; user1 likes item1, user2 likes all three items and user3 likes just item3.fig1user / item interest graphone way...
Read and Post Comments

fuzzy jaccard

July 31, 2012 at 08:00 PM | categories: text, similarity | View Comments

the jaccard coefficient is one of the fundamental measures for doing set similarity. ( recall jaccard(set1, set2) = |intersection| / |union|. when set1 == set2 this evaluates to 1.0 and when set1 and set2 have no intersection it evaluates to 0.0 )one thing that's always annoyed me about it though is that is loses any sense of partial similarity. as a set based measure it's all or nothing.so consider the sets set1 = {i1, i2, i3} and set2 = {i1, i2, i4}jaccard(set1, set2) = 2/4 = 0.5 which is fine given you have no prior info about the relationship between...
Read and Post Comments

old projects...