brain of mat kelcey
cool bash stuff; mkfifo
April 15, 2010 at 09:33 PM | categories: unix, bash | View Comments
mkfifo is one of those shell commands provided as part of coreutils that not many people seem to know about.here's an (semi contrived) example close to something i did the other day to show how awesome it issay you have a number of largish presorted files; run-00 to run-03; and you want to find the most frequent lines. you could do something like the following...sort -m run-* | uniq -c | sort -nr | headhowever you'll know that from previous posts i just loooove keeping all my data compressed on disk so instead i've got run-00.gz to run-03.gzwithout having to...
xargs parallel execution
November 06, 2009 at 09:57 PM | categories: unix, bash | View Comments
just recently discovered xargs has a parallelise option!i have 20 files, sample.01.gz to sample.20.gz, each ~100mb in size that i need to run a script overone option iszcat sample*gz | ./script.rb > outputbut this will process the files sequentially on a single core.to get some parallel action going i could generate a temp script that produceszcat sample.01.gz | ./script.rb > sample.01.out &zcat sample.02.gz | ./script.rb > sample.02.out &...zcat sample.20.gz | ./script.rb > sample.20.out &and run that but this will have all 20 running at the same time and produce contention(though with only 20 files this might not be a problem)instead...
old projects...
- latent semantic analysis via the singular value decomposition (for dummies)
- semi supervised naive bayes
- statistical synonyms
- round the world tweets
- decomposing social graphs on twitter
- do it yourself statistically improbable phrases
- should i burn it?
- the median of a trillion numbers
- deduping with resemblance metrics
- simple supervised learning / should i read it?
- audioscrobbler experiments
- chaoscope experiment