me on twitter

brain of mat kelcey


fastmap and the jaccard distance

October 31, 2008 at 11:31 AM | categories: algorithms, deduplication, c++ | View Comments

given a set of pairwise distances how do you determine what points correspond to those distances?my latest experiment considers this problem in relation to jaccard distances, a resemblance measure similar to jaccard coefficients used in a previous experimentby using the fastmap algorithm we get points from distances and once you have points you have visualisation!...
Read and Post Comments

openmp = easy multi threading

October 13, 2008 at 11:30 AM | categories: openmp, multicore, c++ | View Comments

openmp is a compiler library, available in gcc since v4.2, for giving hints to a compiler about where code can be parallelized.say we have some code12for(int i=0; i<HUGE_NUMBER; ++i) deadHardCalculation(i)we can make this run on multi threaded by simply adding some pragmas123456#pragma omp parallel num_threads(4){ #pragma omp for for(int i=0; i&lt;HUGE_NUMBER; ++i) deadHardCalculation(i);}compiling with -fopenmp will generate an app that splits the work of the for loop across 4 threads.there’s support for dynamic / static scheduling, accumulators, all sortsthis tute is awesome.it increased the speed of my shingling code by 350% on a quad...
Read and Post Comments

shingling and the jaccard index

October 06, 2008 at 11:30 AM | categories: ruby, algorithms, deduplication, c++ | View Comments

on the weekend i did another experiment using shingling and the jaccard index to try to determine if two sets of data were “duplicates”it works quite well and includes a ruby and c++ version with low level bit operations.project page is www.matpalm.com/resemblancecode at github.com/matpalm/resemblance...
Read and Post Comments

java is the new c++

October 05, 2008 at 11:29 AM | categories: rant, java, c++ | View Comments

this year would have been my ten year anniversary of commercially coding in java. it’s not going to be though since the last six months have been ruby. even with my huge investment in java i’d be quite happy to never write a line of it again.i remember when java was first moving in. it was not as performant as c/c++ but it was much easier to write good clean code. and who really cares about performance? scalability is what matters and it's decided by design and architecture, not language choice. as a new language java made sure it had...
Read and Post Comments

old projects...