<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>brain of matpalm &#187; betweenness</title>
	<atom:link href="http://matpalm.com/blog/tag/betweenness/feed/" rel="self" type="application/rss+xml" />
	<link>http://matpalm.com/blog</link>
	<description>thoughts from a data scientist wannabe</description>
	<lastBuildDate>Mon, 16 Aug 2010 11:38:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>e10.6 community detection for my twitter network</title>
		<link>http://matpalm.com/blog/2010/04/04/375/</link>
		<comments>http://matpalm.com/blog/2010/04/04/375/#comments</comments>
		<pubDate>Sun, 04 Apr 2010 02:58:28 +0000</pubDate>
		<dc:creator>matpalm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[betweenness]]></category>
		<category><![CDATA[e10]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[social network]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://matpalm.com/blog/?p=375</guid>
		<description><![CDATA[last night i applied my network decomposition algorithm to a graph of some of the people near me in twitter.
first i build a friend graph for 100 people &#8216;around&#8217; me (taken from a crawl i did last year). by &#8216;friend&#8217; i mean that if alice follows bob then bob also follows alice.
here the graph, some [...]]]></description>
			<content:encoded><![CDATA[<p>last night i applied my network decomposition algorithm to a graph of some of the people near me in twitter.</p>
<p>first i build a friend graph for 100 people &#8216;around&#8217; me (taken from a <a href="http://matpalm.com/blog/2009/09/29/e10-3-twitter-crawl-progress/">crawl</a> i did last year). by &#8216;friend&#8217; i mean that if alice follows bob then bob also follows alice.</p>
<p>here the graph, some things to note though; it was an unfinished crawl (can a crawl of twitter EVER be finished) and was done october last year so is a bit out of date.</p>
<p><a href="http://matpalm.com/blog/wp-content/uploads/2010/04/friends.jpg"><img class="aligncenter size-large wp-image-377" title="friends" src="http://matpalm.com/blog/wp-content/uploads/2010/04/friends-1024x204.jpg" alt="friends" width="1024" height="204" /></a><span id="more-375"></span></p>
<p>and here is the dendrogram decomposition</p>
<p><a href="http://matpalm.com/blog/wp-content/uploads/2010/04/dendrogram.vert_.600.jpg"><img class="aligncenter size-full wp-image-391" title="dendrogram.vert.600" src="http://matpalm.com/blog/wp-content/uploads/2010/04/dendrogram.vert_.600.jpg" alt="dendrogram.vert.600" width="600" height="1500" /></a>some interesting clusterings come out..</p>
<p>right at the bottom we have a small clique (ie everyone following everyone else) of people i&#8217;ve known from when i was in <em>sydney</em></p>
<p><a href="http://matpalm.com/blog/wp-content/uploads/2010/04/sydney.nokia_.jpg"><img class="aligncenter size-full wp-image-387" title="sydney.nokia" src="http://matpalm.com/blog/wp-content/uploads/2010/04/sydney.nokia_.jpg" alt="sydney.nokia" width="185" height="98" /></a></p>
<p>this small group connects to the group i&#8217;m in; <a href="http://twitter.com/tinybuddha">tinybuddha</a> down to <a href="http://twitter.com/evanbottcher">evanbottcher</a>; which roughly describes the group of people i&#8217;ve met in <em>melbourne</em>.</p>
<p>the order of the single breakaways in the melbourne group is pretty arbitrary. i get quite different ordering if i run the decomposition multiple times due to the random tie breaking involved. i could either run the decomposition multiple times and work out some kind of averaging or choose another more granular way of deciding how to break ties.</p>
<p>the next connector after <em>syndey</em> and <em>melbourne</em> are unified is <a href="http://twitter.com/deanemorrow">deanemorrow</a> a coworker when i was at <a href="http://twitter.com/distra">distra</a>. this one sticks out for me as being the biggest flaw in the clustering since it would have made more sense to have him placed near distra at the bottom.</p>
<p>another interesting clique is near me..</p>
<p><a href="http://matpalm.com/blog/wp-content/uploads/2010/04/twers.jpg"><img class="aligncenter size-full wp-image-393" title="twers" src="http://matpalm.com/blog/wp-content/uploads/2010/04/twers.jpg" alt="twers" width="115" height="123" /></a>it has four thoughtworkers; <a href="http://twitter.com/markryall">mark</a>, <a href="http://twitter.com/grillp">gill</a>, <a href="http://twitter.com/debbiecheong">debs</a> and <a href="http://twitter.com/evanbottcher">evan</a> and one sensiser; <a href="http://twitter.com/kornys">korny</a>. did korny perhaps work for thoughtworks in a previous life ;)</p>
<p>another interesting note is there exists a path from me to <a href="http://twitter.com/norvig">peter norvig</a> (who is too busy for twitter it seems) but only because of the huge connector nodes that exist in twitter. an example in this case is <a href="http://twitter.com/tuaw">TUAW</a> who follow 30,000+ people and have even more followers. these nodes cause a bit of noise in the system since they are slightly false representations of what a &#8216;friend&#8217; means in my mind. not sure how to take these numbers into account&#8230;</p>
<p>things to do&#8230;</p>
<ul>
<li>the biggest oversimplification in this system is how i break ties for deciding which edge to cut out next if multiple exist with the same betweenness. currently it chooses the one that would make the most even sized break (based on smallest standard deviation of the connected components). though this is good for breaking a group into even sizes it&#8217;s bad since it favours breaking a single element off a large group. this is what has caused the &#8216;laddering&#8217; we see in the melbourne group.</li>
<li>the shortest path algorithm used to calculate edge betweenness is stochastic and if multiple shortest paths exist only one of them is chosen. it&#8217;d be better if all were considered with a weighting scheme.</li>
<li>it might be better to consider vertex betweenness instead of edge betweenness since one person could exist in multiple groups. if i started down this path though i think i&#8217;d rather just rewrite the lot using something like  the <a href="http://en.wikipedia.org/wiki/Clique_percolation_method">clique percolation method</a></li>
</ul>
<p><a href="http://github.com/matpalm/tgraph">all the code is on github</a></p>
]]></content:encoded>
			<wfw:commentRss>http://matpalm.com/blog/2010/04/04/375/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>e10.5 revisiting community detection</title>
		<link>http://matpalm.com/blog/2010/03/30/e10-5-revisiting-community-detection/</link>
		<comments>http://matpalm.com/blog/2010/03/30/e10-5-revisiting-community-detection/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 10:42:43 +0000</pubDate>
		<dc:creator>matpalm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[betweenness]]></category>
		<category><![CDATA[e10]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[social network]]></category>

		<guid isPermaLink="false">http://matpalm.com/blog/?p=357</guid>
		<description><![CDATA[i&#8217;ve decided to switch back to some previous work i did on community detection in (social) graphs
the last chunk of code i wrote which tried to deal with weighted directed graphs was terribly, terribly, broken but it seems that simplifying to undirected graphs is giving me much saner results. yay!
here&#8217;s an example of my work [...]]]></description>
			<content:encoded><![CDATA[<p>i&#8217;ve decided to switch back to <a href="http://matpalm.com/blog/2009/10/06/e10-4-communities-in-social-graphs/">some previous work</a> i did on community detection in (social) graphs</p>
<p>the <a href="http://github.com/matpalm/tgraph/tree/master/girvan_newman">last chunk of code</a> i wrote which tried to deal with weighted directed graphs was terribly, terribly, broken but it seems that simplifying to undirected graphs is giving me much saner results. yay!</p>
<p>here&#8217;s an example of my work in progress generated from <a href="http://github.com/matpalm/tgraph/tree/master/girvan_newman_2">the new version of the code</a></p>
<p>consider the graph</p>
<p><img class="aligncenter size-medium wp-image-358" title="p97" src="http://matpalm.com/blog/wp-content/uploads/2010/03/p97-214x300.png" alt="p97" width="214" height="300" /></p>
<p>and it&#8217;s corresponding decomposition</p>
<p><img class="aligncenter size-full wp-image-360" title="p97.dendrogram" src="http://matpalm.com/blog/wp-content/uploads/2010/03/p97.dendrogram.jpg" alt="p97.dendrogram" width="400" height="400" /></p>
<p>the results are reasonable; the initial breaking of clusters [1,2,3,4,5,6] and [7,8,9,10,11,12] is the most obvious but some of the others are not as intuitive</p>
<p>[1,2,5] and [7,8,10] remain as unbreakable <a href="http://en.wikipedia.org/wiki/Clique_(graph_theory)">cliques</a> though it&#8217;s arbitrary that 11 was broken off from [7,8,10] instead of 10 (arbitrary but an artifact related to my shortest path calculation for the edge betweenness)</p>
<p>the idea of identifying the edge to remove using <a href="http://en.wikipedia.org/wiki/Betweenness#Betweenness_centrality">edge betweenness</a> works well but it is often the case there are many edges with the same maximal betweeness and you have to choose only one. i think my current implementation of picking one is a bit naive and i&#8217;m not sure if i should move to a stochastic / <a href="http://en.wikipedia.org/wiki/Monte_Carlo_method">monte carlo style approach</a> or focus more on <a href="http://en.wikipedia.org/wiki/Community_structure#Modularity_maximization">modularity maximisation</a></p>
]]></content:encoded>
			<wfw:commentRss>http://matpalm.com/blog/2010/03/30/e10-5-revisiting-community-detection/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>e10.4 communities in social graphs</title>
		<link>http://matpalm.com/blog/2009/10/06/e10-4-communities-in-social-graphs/</link>
		<comments>http://matpalm.com/blog/2009/10/06/e10-4-communities-in-social-graphs/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 10:05:01 +0000</pubDate>
		<dc:creator>matpalm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[betweenness]]></category>
		<category><![CDATA[e10]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[social network]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://matpalm.com/blog/?p=83</guid>
		<description><![CDATA[social graphs, like twitter or facebook, often follow the pattern of having clusters of highly connected components with an occasional edge joining these clusters.
these connecting edges define the boundaries of communities in the social network and can be identified by algorithms that measure betweenness.
the girvan-newman algorithm can be used to decompose a graph hierarchically based [...]]]></description>
			<content:encoded><![CDATA[<p>social graphs, like twitter or facebook, often follow the pattern of having clusters of highly connected components with an occasional edge joining these clusters.</p>
<p>these connecting edges define the boundaries of communities in the social network and can be identified by algorithms that measure <a href="http://en.wikipedia.org/wiki/Betweenness#Betweenness_centrality">betweenness</a>.</p>
<p>the <a href="http://en.wikipedia.org/wiki/Girvan-Newman_algorithm">girvan-newman algorithm</a> can be used to decompose a graph hierarchically based on successive removal of the edges with the highest betweenness.</p>
<p>the algorithm is basically</p>
<ol>
<li>calculate the betweenness of each edge (using an all shortest paths algorithm)</li>
<li>remove the edge(s) with the highest betweenness</li>
<li>check for connected components (using <a href="http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm">tarjan&#8217;s</a> algorithm)</li>
<li>repeat for graph or subgraphs if graph was split</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://matpalm.com/blog/2009/10/06/e10-4-communities-in-social-graphs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
