me on twitter

brain of mat kelcey


shingling and the jaccard index

October 06, 2008 at 11:30 AM | View Comments

on the weekend i did another experiment using shingling and the jaccard index to try to determine if two sets of data were “duplicates”

it works quite well and includes a ruby and c++ version with low level bit operations.

project page is www.matpalm.com/resemblance

code at github.com/matpalm/resemblance

blog comments powered by Disqus