13

So I have a while loop:

cat live_hosts | while read host; do \
    sortstuff.sh -a "$host" > sortedstuff-"$host"; done

But this can take a long time. How would I use GNU Parallel for this while loop?

agc
  • 7,045
  • 3
  • 23
  • 53
Proletariat
  • 669
  • 3
  • 16
  • 28

3 Answers3

14

You don't use a while loop.

parallel "sortstuff.sh -a {} > sortedstuff-{}" <live_hosts

Note that this won't work if you have paths in your live_hosts (e.g. /some/dir/file) as it would expand to sortstuff.sh -a /some/dir/file > sortedstuff-/some/dir/file (resulting in no such file or directory); for those cases use {//} and {/} (see gnu-parallel manual for details):

parallel "sortstuff.sh -a {} > {//}/sortedstuff-{/}" <live_hosts
don_crissti
  • 79,330
  • 30
  • 216
  • 245
  • Is it possible to use `tee` with `parallel` when putting the output into `sortedstuff`? So I can see the output as it goes. – Proletariat Sep 22 '15 at 14:16
  • 1
    @Proletariat - you want to output to terminal too ? Just replace `>` with `| tee` e.g. the first command becomes `parallel "sortstuff.sh -a {} | tee sortedstuff-{}" – don_crissti Sep 22 '15 at 15:10
3

As an old-school "do one thing and do it well" Unix guy, I'd put the string substitution stuff into a wrapper script:

#!/bin/sh
sortstuff.sh -a "$1" > sortedstuff-"$1"

If you call it wrapper.sh, the parallel command to call it would be:

parallel wrapper.sh < live_hosts

Note that you don't need cat for this kind of thing, which saves an external program invocation.

Warren Young
  • 71,107
  • 16
  • 178
  • 168
2

You don't need parallel, since the body of the loop doesn't depend on previous iterations. Just start a new background process for each host.

while read host; do
    sortstuff.sh -a "$host" > sortedstuff-"$host" &
done < live_hosts
wait    # Optional, to block until the background tasks are done

parallel does make it easier to manage certain aspects, though; you can limit the number of jobs running in parallel more easily.

chepner
  • 7,341
  • 1
  • 26
  • 27
  • 6
    If `wc -l live_hosts` is more than the number of disk spindles or CPU cores — depending on whether the task is I/O or CPU-bound — you're going to eat up a lot of the advantage you get from parallelism with a solution like that. The ability of `parallel` to limit the number of jobs isn't just nice, it's near-essential, if processing speed is your goal. – Warren Young Sep 11 '15 at 20:47