brain of mat kelcey...

brutally short intro to collaborative filtering

March 18, 2010 at 08:38 PM | categories: Uncategorized

my favourite recommendations system is the collaborative filter; it gives good results and is easy to understand and extend as required.

it works on the intuition that if i like coffee, chocolate and ice cream and you like coffee and chocolate you might also like ice cream

so we need a little bit of terminology; users (me and you), items (coffee, chocolate and ice cream)

in a user based collaborative filter the process is

to calculate recommendation for user1
 for each other user (user2)
  calculate user_similarity_score between user1 and user2 (0 -> 1 value )
  if the user_similarity_score is non zero
  for each item user2 has that user1 doesn't
   add to user1's recommendations, weighted by the user_similarity_score

e.g. say alice, bob, charlie and dave have listed the things they like...

alice likes coffee and chocolate bob likes coffee, chocolate and ice cream charlie likes coffee, ice cream and carob dave likes carob and fruit cake

to recommend items for alice we first need a way to calculate a similarity between users

a reasonable measure when we just have these sets of items is the jaccard coefficient defined simply as the number of items in common divided by the total number of items.

Jaccard(alice,bob) = 2/3
Jaccard(alice,charlie) = 1/4
Jaccard(alice,dave) = 0/4 = 0

we can then build up a list of alice's recommendations based on the items others have seen that alice hasn't

from bob we get ice cream for a value of 2/3 from charlie we get ice cream for 1/4 and carob for 1/4 we can ignore dave since alice had nothing in common with them.

so ice cream is the highest recommended item with a score of 2/3 + 1/4 = 0.91 carob is also recommended but with a much lower strength = 0.25

easy peasy and a great place to start!