brain of mat kelcey...


shingling and the jaccard index

October 06, 2008 at 11:30 AM | categories: Uncategorized

on the weekend i did another experiment using shingling and the jaccard index to try to determine if two sets of data were “duplicates”

it works quite well and includes a ruby and c++ version with low level bit operations.

project page is www.matpalm.com/resemblance

code at github.com/matpalm/resemblance