<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>brain of matpalm &#187; big data</title>
	<atom:link href="http://matpalm.com/blog/tag/big-data/feed/" rel="self" type="application/rss+xml" />
	<link>http://matpalm.com/blog</link>
	<description>thoughts from a data scientist wannabe</description>
	<lastBuildDate>Mon, 16 Aug 2010 11:38:22 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>e10.0 introducing tgraph</title>
		<link>http://matpalm.com/blog/2009/09/19/e10-0-introducing-tgraph/</link>
		<comments>http://matpalm.com/blog/2009/09/19/e10-0-introducing-tgraph/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 04:41:45 +0000</pubDate>
		<dc:creator>matpalm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[algorithms]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[e10]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[pig]]></category>
		<category><![CDATA[twitter]]></category>

		<guid isPermaLink="false">http://matpalm.com/blog/?p=47</guid>
		<description><![CDATA[so e9 sip is on hold for a bit while i kick off e10 tgraph. was looking for another problem to try hadoop with and came across a classic graph one, pagerank. a well understood algorithm like page rank will be a  great chance to try pig, the query language that sits on top of [...]]]></description>
			<content:encoded><![CDATA[<p>so <a href="http://matpalm.com/sip/">e9 sip</a> is on hold for a bit while i kick off e10 tgraph. was looking for another problem to try hadoop with and came across a classic graph one, <a title="pagerank" href="http://en.wikipedia.org/wiki/PageRank">pagerank</a>. a well understood algorithm like page rank will be a  great chance to try <a href="http://hadoop.apache.org/pig/">pig</a>, the query language that sits on top of hadoop mapreduce.</p>
<p>so we need a graph to work on. my first thoughts were using one of the <a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=2596">wikipedia linkage dumps</a> but it feels a bit sterile. instead it&#8217;s a good excuse to do a little crawl of the following graph of twitter.</p>
<p>this will also be a chance to try to document a project via a blog. <a href="http://www.skorks.com/">skorks</a>&#8216; incessant blog rambling has convinced me to give it a go.</p>
]]></content:encoded>
			<wfw:commentRss>http://matpalm.com/blog/2009/09/19/e10-0-introducing-tgraph/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>first hadoop experiment</title>
		<link>http://matpalm.com/blog/2009/09/16/first-hadoop-experiment/</link>
		<comments>http://matpalm.com/blog/2009/09/16/first-hadoop-experiment/#comments</comments>
		<pubDate>Wed, 16 Sep 2009 09:26:00 +0000</pubDate>
		<dc:creator>matpalm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[ec2]]></category>
		<category><![CDATA[hadoop]]></category>

		<guid isPermaLink="false">http://matpalm.com/blog/?p=43</guid>
		<description><![CDATA[just finished my first hadoop experiment.
http://matpalm.com/sip
not fantastic results but heaps of of feedback from hadoop mailing group
more results coming soon
]]></description>
			<content:encoded><![CDATA[<p>just finished my first hadoop experiment.</p>
<p><a href="http://matpalm.com/sip">http://matpalm.com/sip</a></p>
<p>not fantastic results but heaps of of feedback from hadoop mailing group</p>
<p>more results coming soon</p>
]]></content:encoded>
			<wfw:commentRss>http://matpalm.com/blog/2009/09/16/first-hadoop-experiment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>how using compressed data can make you app faster</title>
		<link>http://matpalm.com/blog/2009/06/28/how-using-compressed-data-can-make-you-app-faster/</link>
		<comments>http://matpalm.com/blog/2009/06/28/how-using-compressed-data-can-make-you-app-faster/#comments</comments>
		<pubDate>Sun, 28 Jun 2009 11:32:43 +0000</pubDate>
		<dc:creator>matpalm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[big data]]></category>
		<category><![CDATA[gzip]]></category>
		<category><![CDATA[sys admin]]></category>

		<guid isPermaLink="false">http://matpalm.com/blog/?p=25</guid>
		<description><![CDATA[when working with larger data sets (ie more than can fit in memory) there are two important resources to juggle…


cpu. how quickly can you process the data.
disk io. how quickly can you get data to the cpu.

i remember reading once that depending on your situation you might be better off using data compressed on disk. [...]]]></description>
			<content:encoded><![CDATA[<p>when working with larger data sets (ie more than can fit in memory) there are two important resources to juggle…</p>
<div>
<ol>
<li>cpu. how quickly can you process the data.</li>
<li>disk io. how quickly can you get data to the cpu.</li>
</ol>
<p>i remember reading once that depending on your situation you might be better off using data compressed on disk. why? because the extra cpu time used decompressing it is worth it for the time saved getting it off disk.</p>
<p>i’ve recently been working with a number crunching app (burns 100% cpu of a quadcore machine for an hour over a 7gb working dataset) and thought it’d be a good chance to try this theory.</p>
<p>quite surprisingly it actually worked; the 7.2gb dataset came down to 1.3gb and the runtime was reduced from 1hr 5m to 56m. cool.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://matpalm.com/blog/2009/06/28/how-using-compressed-data-can-make-you-app-faster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
