brain of mat kelcey
how using compressed data can make you app faster
June 28, 2009 at 11:32 AM | categories: gzip, big data, sys admin | View Comments
when working with larger data sets (ie more than can fit in memory) there are two important resources to juggle…cpu. how quickly can you process the data.disk io. how quickly can you get data to the cpu.i remember reading once that depending on your situation you might be better off using data compressed on disk. why? because the extra cpu time used decompressing it is worth it for the time saved getting it off disk.i’ve recently been working with a number crunching app (burns 100% cpu of a quadcore machine for an hour over a 7gb working dataset) and thought...
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment