2

I am having problems adapting a bash script to handle a simple parallel execution from Ubuntu 20.04 to CentOS Linux 8. In the script I spawn multiple "readers" that read a string from a FIFO and output it on a common file. The strings are passed to the FIFO directly from the main script. I use file descriptors and locks to make the process clean. I'm also waiting that all the readers start before writing on the FIFO. I'm including the whole script at the end.

The script work flawlessly on Ubuntu, and the output is this (first column is the reader ID, the second is what the reader got from the FIFO)

2 1
4 2
1 3
...
3 4998
2 4999
4 5000

The reading is complete and the messages were transmitted one by one.

Using the same script on CentOS I get this.

4 1
1 7
2 8
...
3 153
4 154
1 155

There is an evident jump, and the messages from 2 to 6 are lost completely. Moreover, the process stops very prematurely at 155.

I really don't know what's going on. Any idea?

The script:

#!/bin/bash

readers=4
objs=5000

echo "" > output.txt

# Temporary files and fifo
FIFO=$(mktemp -t fifo-XXXX)
START=$(mktemp -t start-XXXX)
START_LOCK=$(mktemp -t lock-XXXX)
FIFO_LOCK=$(mktemp -t lock-XXXX)
OUTPUT_LOCK=$(mktemp -t lock-XXXX)
rm $FIFO
mkfifo $FIFO

# Cleanup trap
cleanall() {
rm -f $FIFO
rm -f $START
rm -f $START_LOCK
rm -f $FIFO_LOCK
rm -f $OUTPUT_LOCK
}
trap cleanall exit

# Reader process
reader() {
    ID=$1    
    exec 3<$FIFO
    exec 4<$FIFO_LOCK
    exec 5<$START_LOCK
    exec 6<$OUTPUT_LOCK
    
    # Signal the reader has started
    flock 5                
    echo $ID >> $START
    flock -u 5
    exec 5<&- 

    # Reading loop
    while true; do
        flock 4  
        read -su 3 item
        read_status=$?
        flock -u 4  
        if [[ $read_status -eq 0 ]]; then
            flock 6
            echo "$ID $item" >> output.txt
            flock -u 6  
        else
            break # EOF reached
        fi
    done

    exec 3<&-
    exec 4<&-
    exec 6<&-
}

# Spawn readers
for ((i=1;i<=$readers;i++)); do
    reader $i &
done

exec 3>$FIFO

# Wait for all the readers
exec 5<$START_LOCK
while true; do
        flock 5
        started=$(wc -l $START | cut -d \  -f 1)
        flock -u 5
        if [[ $started -eq $readers ]]; then
            break
        else
            sleep 0.5s
        fi
done
exec 5<&-

# Writing loop
for ((i=1;i<=$objs;i++)); do
    echo $i 1>&3
done

exec 3<&- 
wait

echo "Script done"

exit 0
Sine
  • 21
  • 2
  • Probably not the issue, but you have a couple of bad practices there: i) don't use CAPS for your variable names, that can lead to naming collisions with global environment variables which can cause very hard to debug problems; ii) [***always quote your variables***](https://unix.stackexchange.com/q/171346/22222), especially when using them with destructive commands such as `rm`. – terdon Apr 13 '22 at 12:36
  • @terdon Noted that. Thanks. – Sine Apr 13 '22 at 14:12
  • Note: as a workaround I used a temporary file to store all the messages. Then I have the read to read and delete one line from the top until the list is over. The effect is similar to a fifo, but of course it is not a pipe and it is less flexible and performing. – Sine Apr 13 '22 at 15:19
  • Here's what I observed while running your script on RedHat 8: It works as expected when `output.txt` is on a local filesystem, but not over NFS. – Fravadona Apr 13 '22 at 19:02
  • moving `>> output.txt` from inside to outside the `while` loop ( `done >> output.txt` ) seems to fix the problem – Fravadona Apr 13 '22 at 21:48
  • @Fravadona nice catch. In the actual script that I am adapting I'm not writing a file, but I am launching a program instead. Same concept, but instead of the `>> output.txt` I have a call to a program, and the messages are the arguments to be passed. Do you have any idea if a similar workaround could be implemented there? – Sine Apr 14 '22 at 06:42
  • In that case the concurrent reading part is enough – Fravadona Apr 14 '22 at 12:42

0 Answers0