me on twitter

brain of mat kelcey


fastmap and the jaccard distance

October 31, 2008 at 11:31 AM | categories: algorithms, deduplication, c++ | View Comments

given a set of pairwise distances how do you determine what points correspond to those distances?my latest experiment considers this problem in relation to jaccard distances, a resemblance measure similar to jaccard coefficients used in a previous experimentby using the fastmap algorithm we get points from distances and once you have points you have visualisation!...
Read and Post Comments

shingling and the jaccard index

October 06, 2008 at 11:30 AM | categories: ruby, algorithms, deduplication, c++ | View Comments

on the weekend i did another experiment using shingling and the jaccard index to try to determine if two sets of data were “duplicates”it works quite well and includes a ruby and c++ version with low level bit operations.project page is www.matpalm.com/resemblancecode at github.com/matpalm/resemblance...
Read and Post Comments

old projects...