August 20, 2012 at 08:00 PM | categories: graph, similarity | View Comments

the basis of most recommendation systems is the ability to rate similarity between items. there are lots of different ways to do this. one model is based the idea of an interest graph where the nodes of the graph are users and items and the edges of the graph represent an interest, whatever that might mean for the domain. if we only allow edges between users and items the graph is bipartite.let's consider a simple example of 3 users and 3 items; user1 likes item1, user2 likes all three items and user3 likes just item3.fig1user / item interest graphone way...

do all first links on wikipedia lead to philosophy?

August 13, 2011 at 03:00 PM | categories: graph, wikipedia | View Comments

(update: like all interesting things it turns out someone else had already done this :D)a recent xkcd posed the idea...wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at Philosophy.this raises a number of questionsQ: though i wouldn't be surprised if it's true for most articles it can't be true for all articles. can it?Q: what's the distribution of distances (measured in "number of clicks away") from 'Philosophy'?Q: by this same measure what's the furthest article from 'Philosophy'?Q: are there...

Read and Post Comments

e10.6 community detection for my twitter network

April 04, 2010 at 12:58 PM | categories: e10, twitter, betweenness, social network, graph | View Comments

last night i applied my network decomposition algorithm to a graph of some of the people near me in twitter.first i build a friend graph for 100 people 'around' me (taken from a crawl i did last year). by 'friend' i mean that if alice follows bob then bob also follows alice.here the graph, some things to note though; it was an unfinished crawl (can a crawl of twitter EVER be finished) and was done october last year so is a bit out of date.moreand here is the dendrogram decompositionsome interesting clusterings come out..right at the bottom we have a...

Read and Post Comments

e10.5 revisiting community detection

March 30, 2010 at 08:42 PM | categories: e10, betweenness, social network, graph | View Comments

i've decided to switch back to some previous work i did on community detection in (social) graphsthe last chunk of code i wrote which tried to deal with weighted directed graphs was terribly, terribly, broken but it seems that simplifying to undirected graphs is giving me much saner results. yay!here's an example of my work in progress generated from the new version of the codeconsider the graphand it's corresponding decompositionthe results are reasonable; the initial breaking of clusters [1,2,3,4,5,6] and [7,8,9,10,11,12] is the most obvious but some of the others are not as intuitive[1,2,5] and [7,8,10] remain as unbreakable cliques...

Read and Post Comments

e10.4 communities in social graphs

October 06, 2009 at 08:05 PM | categories: e10, twitter, social network, betweenness, algorithms, graph | View Comments

social graphs, like twitter or facebook, often follow the pattern of having clusters of highly connected components with an occasional edge joining these clusters.these connecting edges define the boundaries of communities in the social network and can be identified by algorithms that measure betweenness.the girvan-newman algorithm can be used to decompose a graph hierarchically based on successive removal of the edges with the highest betweenness.the algorithm is basicallycalculate the betweenness of each edge (using an all shortest paths algorithm)remove the edge(s) with the highest betweennesscheck for connected components (using tarjan's algorithm)repeat for graph or subgraphs if graph was split...

Read and Post Comments

e10.2 tgraph crawl order example

September 21, 2009 at 09:58 PM | categories: e10, graph | View Comments

let's consider an example of the crawl order for tgraph...we seed our frontier with 'a' and bootstrap cost of 0.fetching the info for 'a' shows 2 outedges to 'b' and 'c', from our cost formula these all have cost 0 + 1 + Log10(2+1) = 1.6our frontier becomes [ {b,1.6}, {c,1.6} ]next is 'b' and see it has an outdegree of 3, these nodes, b1 -> b3, all have a cost of 1.6 + 1 + Log10(3+1) = 3.2our frontier becomes [ {c,1.6}, {b1,3.2}, {b2,3.2}, {b3,3.2} ]next is 'c' with an outdegree of 15. these 15 nodes, c1 -> c15,...

Read and Post Comments

e10.1 crawling twitter

September 19, 2009 at 09:31 PM | categories: e10, twitter, algorithms, graph | View Comments

our first goal is to get some data and the twitter api makes getting the data trivial. i'm focused mainly on the friends stuff but because it only gives user ids i'll also get the user info so i can put names to ids.a depth first crawl makes no sense for this one experiment, we're unlikely to get the entire graph and are more interested in following edges "close" to me. instead we'll use a breadth first search.since any call to twitter is expensive (in time that is, they rate limit their api calls) instead of a plain vanilla breadth...

Read and Post Comments

old projects...

brain of mat kelcey

item similarity by bipartite graph dispersion

do all first links on wikipedia lead to philosophy?

e10.6 community detection for my twitter network

e10.5 revisiting community detection

e10.4 communities in social graphs

e10.2 tgraph crawl order example

e10.1 crawling twitter