Posts Tagged ‘bash’

cool bash stuff; mkfifo

Thursday, April 15th, 2010

mkfifo is one of those shell commands provided as part of coreutils that not many people seem to know about.

here’s an (semi contrived) example close to something i did the other day to show how awesome it is

say you have a number of largish presorted files; run-00 to run-03; and you want to find the most frequent lines. you could do something like the following…

sort -m run-* | uniq -c | sort -nr | head

(more…)

xargs parallel execution

Friday, November 6th, 2009

just recently discovered xargs has a parallelise option!

i have 20 files, sample.01.gz to sample.20.gz, each ~100mb in size that i need to run a script over

one option is

zcat sample*gz | ./script.rb > output

but this will process the files sequentially on a single core.

to get some parallel action going i could generate a temp script that produces

zcat sample.01.gz | ./script.rb > sample.01.out &
zcat sample.02.gz | ./script.rb > sample.02.out &
...
zcat sample.20.gz | ./script.rb > sample.20.out &

and run that but this will have all 20 running at the same time and produce contention

(though with only 20 files this might not be a problem)

instead i can make a temp script, parse.sh

zcat $1 | ./script.rb > $1.out

and run

find sample*gz | xargs -n1 -P4 sh parse.sh
cat *out > output

what is this xargs command doing?

  • -n1 passes one arg a time to the run comamnd (instead of the xargs default of passing all args)
  • -P4 says have at most 4 commands running at the same time

100% on all cores (and only because the disk can keep up)

awesome!