me on twitter

brain of mat kelcey


fuzzy jaccard

July 31, 2012 at 08:00 PM | categories: text, similarity | View Comments

the jaccard coefficient is one of the fundamental measures for doing set similarity. ( recall jaccard(set1, set2) = |intersection| / |union|. when set1 == set2 this evaluates to 1.0 and when set1 and set2 have no intersection it evaluates to 0.0 )one thing that's always annoyed me about it though is that is loses any sense of partial similarity. as a set based measure it's all or nothing.so consider the sets set1 = {i1, i2, i3} and set2 = {i1, i2, i4}jaccard(set1, set2) = 2/4 = 0.5 which is fine given you have no prior info about the relationship between...
Read and Post Comments

old projects...