Comparing a text file's unique content to expected string not registered as equal

Question

I wrote a shell script to check which ".err" text files are empty. Some files have a specific repeated phrase, like this example file fake_error.err (blank lines intentional):


WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

that I want to also remove in addition to the empty files. I wrote the following script to do so

#!/bin/bash

for file in *error.err; do
    if [ ! -s $file ]
    then
        echo "$file is empty"
        rm $file
    else
        # Get the unique, non-blank lines in the file, sorted and ignoring blank space
        lines=$(grep -v "^$" "$file" | sort -bu "$file")
        echo $lines

        EXPECTED="WARNING: reaching max number of iterations"
        echo $EXPECTED

        if [ "$lines" = "$EXPECTED" ]
        then
            # Remove the file that only has iteration warnings
            echo "Found reached max iterations!"
            rm $file
        fi

    fi
done

However, the output of this script when run on the fake_error.err files is

WARNING: reaching max number of iterations
WARNING: reaching max number of iterations

from the two $echo statements in the loop, but the file itself is not deleted and the string "Found reached max iterations!" is not printed. I think the issue is in if [ "$lines" = "$EXPECTED" ] and I've tried using double brackets [[ ]] and == but none of those worked. I have no idea what the difference between the two printed statement are.

Why are the two variables not equal?

see if there are any trailing blanks? `sort -bu` won't delete them. e.g. `echo ":$lines:"` or something like that — ilkkachu, Mar 29 '23 at 20:00
@ilkkachu that's it. The output is `: WARNING: reaching max number of iterations: WARNING: reaching max number of iterations`. How can I remove the blanks from this? — m13op22, Mar 29 '23 at 20:24
Since you're already using grep, I wonder if it would not be simpler to use the exit status of `grep -qv -e '^$' -e 'WARNING: reaching max number of iterations'` directly? — steeldriver, Mar 29 '23 at 20:30
Ooh, that's an idea, @steeldriver! Something like ```if ! (grep -qv -e '^$' -e 'WARNING: reaching max number of iterations' $file)``` since it would return exit status 0 for matches found? — m13op22, Mar 29 '23 at 20:48
@m13op22 yes it "succeeds" if it finds any line that is neither empty not the ignorable phrase - don't think you need the parentheses though — steeldriver, Mar 29 '23 at 20:54
Unless you have files containing ONLY blank lines, you shouldn't need to grep for them - grepping for just "WARNING: reaching maximum...." should be enough. And if you do need to grep for only blank lines, that should be a separate command, perhaps comparing the outputs from `wc -l "$file"` and `grep -c '^[[:blank:]]*$' "$file"`. BTW, you can't combine an inverted match `-v` with a normal match in the same grep command, the `-v` applies to all `-e` options in that command. Use awk or perl if you need to do boolean logic with regex matches like `! /^$/ && /WARNING: reaching.../`. — cas, Mar 30 '23 at 01:54
also, **[quote](https://unix.stackexchange.com/q/131766) your [variables](https://unix.stackexchange.com/q/4899)** — cas, Mar 30 '23 at 01:56
@cas good point about the quotes, thanks for making sure I'm not being sloppy! Some files are only blank lines, so it's better to use a separate command that uses `awk` instead? — m13op22, Mar 30 '23 at 15:22
i don't think so. as far as i can tell from your script above, you want to delete empty files (your -s test works well for that) AND files that contain "WARNING: reaching maximum....". grepping for blank lines isn't needed for that. If you **also** want to contain files containing ONLY blank lines then yes, compare the total line count of each file against the count of empty lines in that file - if equal, then delete it. you'd only need to use awk or perl if you needed more than a simple regex match. — cas, Mar 30 '23 at 16:07
BTW, my awk example was bogus because the `! /^$/` test is redundant, it's always going to be true if the line contains the warning. I just wanted a quick example and didn't think that one through. better would be if you wanted to check if a file contained both foo and bar on the same line, then you'd use `awk '/foo/ && /bar/ { ... }'` — cas, Mar 30 '23 at 16:09

Comparing a text file's unique content to expected string not registered as equal

0 Answers0