brain of mat kelcey...
shingling and the jaccard index
October 06, 2008 at 11:30 AM | categories: Uncategorizedon the weekend i did another experiment using shingling and the jaccard index to try to determine if two sets of data were “duplicates”
it works quite well and includes a ruby and c++ version with low level bit operations.
project page is www.matpalm.com/resemblance
code at github.com/matpalm/resemblance