just recently discovered xargs has a parallelise option!
i have 20 files, sample.01.gz to sample.20.gz, each ~100mb in size that i need to run a script over
one option is
zcat sample*gz | ./script.rb > outputbut this will process the files sequentially on a single core.
to get some parallel action going i could generate a temp script that produces
zcat sample.01.gz | ./script.rb > sample.01.out & zcat sample.02.gz | ./script.rb > sample.02.out & ... zcat sample.20.gz | ./script.rb > sample.20.out &and run that but this will have all 20 running at the same time and produce contention
(though with only 20 files this might not be a problem)
instead i can make a temp script, parse.sh
zcat $1 | ./script.rb > $1.outand run
find samplegz | xargs -n1 -P4 sh parse.sh cat out > outputwhat is this xargs command doing?
- -n1 passes one arg a time to the run comamnd (instead of the xargs default of passing all args)
- -P4 says have at most 4 commands running at the same time