7

In this script, that pulls all git repositories:

#!/bin/bash

find / -type d -name .git 2>/dev/null | 
while read gitFolder; do
    if [[ $gitFolder == *"/Temp/"* ]]; then
        continue;
    fi
    if [[ $gitFolder == *"/Trash/"* ]]; then
        continue;
    fi
    if [[ $gitFolder == *"/opt/"* ]]; then
        continue;
    fi
    parent=$(dirname $gitFolder);
    echo "";
    echo $parent;
    (git -C $parent pull && echo "Got $parent") &
done 
wait
echo "Got all"

the wait does not wait for all git pull subshells.

Why is it so and how can I fix it?

Marcus Müller
  • 21,602
  • 2
  • 39
  • 54
Saeed Neamati
  • 537
  • 4
  • 17
  • 1
    It's unlikely, but still possible, that one of your paths contains a newline at some point in time. And that will break your `read`. Using `find … | ` is practically newer a good way of dealing with these things. Also, it feels very awkward that instead of specifying the directories you actually want to search in you exclude some. Congratulations for searching `/sys/` and `/dev/` and `/var/run/` … for git repos! – Marcus Müller Jan 27 '22 at 17:32
  • @MarcusMüller The issue is that the background tasks are associated with the subshell at the end of the pipe. Puttting tho `wait` in the same subshell helps. – Kusalananda Jan 27 '22 at 17:41
  • @MarcusMüller, thanks for notifying me about those directories. I will exclude them. – Saeed Neamati Jan 27 '22 at 17:41
  • @SaeedNeamati that's the **opposite** I've wanted to achieve. instead of excluding directories, you should search only these that you care for (instead of `/`, which really makes no sense). – Marcus Müller Jan 27 '22 at 17:46
  • As [they's answer](https://unix.stackexchange.com/a/688214/170373) says, this is similar to [Why is my variable local in one 'while read' loop, but not in another seemingly similar loop?](https://unix.stackexchange.com/q/9954/170373). The answers there have some other workarounds. – ilkkachu Jan 27 '22 at 18:27
  • Marcus, when we become dynamic, there is no "inclusive" approach unless you configure stuff. Thus we search everywhere, *that makes sense*. A gir repo might be cloned inside the root, or in home, or in any other place. If I give you my laptop, how do you know where are my git clones? – Saeed Neamati Jan 28 '22 at 10:04
  • Also re. `read line`, if you do that, you probably want `IFS= read -r line` instead, see [Understanding "IFS= read -r line"](https://unix.stackexchange.com/q/209123/170373). (Still only matters if your filenames are naughty.) – ilkkachu Jan 28 '22 at 12:46

1 Answers1

17

The issue is that the wait is run by the wrong shell process. In bash, each part of a pipeline is running in a separate subshell. The background tasks belong to the subshell executing the while loop. Moving the wait into that subshell would make it work as expected:

find ... |
{
    while ...; do
        ...
        ( git -C ... && ... ) &
    done
    wait
}

echo 'done.'

You also have some unquoted variables.

I would get rid of the pipe entirely and instead run the loop from find directly, which gets rid of the need to parse the output from find.

find / -type d -name .git \
    ! -path '*/Temp/*' \
    ! -path '*/opt/*' \
    ! -path '*/Trash/*' \
    -exec sh -c '
    for gitpath do
        git -C "$gitpath"/.. pull &
    done
    wait' sh {} +

Or, using -prune to avoid even entering any of the subdirectories we don't want to deal with,

find / \( -name Temp -o -name Trash -o -name opt \) -prune -o \
    -type d -name .git -exec sh -c '
    for gitpath do
        git -C "$gitpath"/.. pull &
    done
    wait' sh {} +

As mentioned in comments, you could also use xargs to have greater control over the number of concurrently running git processes. The -P option (for specifying the number of concurrent tasks) used below is non-standard, as are -0 (for reading \0-delimited pathnames) and -r (for avoiding running the command when there's no input). GNU xargs and some other implementations of this utility have these options though. Also, the -print0 predicate of find (to output \0-delimited pathnames) is non-standard, but commonly implemented.

find / \( -name Temp -o -name Trash -o -name opt \) -prune -o \
    -type d -name .git -print0 |
xargs -t -0r -P 4 -I {} git -C {}/.. pull

I'm sure GNU parallel could also be used in a similar way, but since this is not the main focus of this question I'm not pursuing that train of thought.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936