0

Big fan of stackoverflow. Am a beginner myself and have found a lot of help on this site but have run into problems now.

Today I have a function like the following.

I read a text file (data.txt) for each new line written to it. If the text line contains any word that is included in the Array "pets", it writes that line into another text file pets.txt but ignores the other lines.

How do I invert that function?

I want to be able to block bad words with an Array(badword) so that these are not written to the file petslist.log

pets.filter contains

pets=(
'Dog'
'Cat'
'Mouse'
'Horse'
)

badword.filter contains

badword=(
'Stupid'
'Dumb'
'Bad'
)

script.sh contains

#!/bin/bash
source /home/pi/pets.filter
source /home/pi/badword.filter


while IFS='' read -r line
do
while [ "${pets[count]}" != "" ]
do
if [ "${line/${pets[count]}}" != "$line" ] ; then
echo "$line" >> /logs/petslist.log
fi
count=$(( $count + 1 ))
done

2 Answers2

0

If badwords is an array of actually words, then you might want so use grep -w:

-w, --word-regexp

Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore. This option has no effect if -x is also specified.

So in your case

# Declare some constants
readonly bad_words_list="stupid dumb bad" \
         out_file="out_file" \
         in_file="in_file"


# The function you want
function filter_bad_words() {
    # Loop for reading line-by-line
    while read -r line
    do
        # Loop through the list
        # Notice that there are no quotes
        for bad_word in ${bad_words_list[@]}
        do
            # Check if there is a bad word
            # Options in grep: quiet, ignore case, word
            if grep -qiw "$bad_word" <<< "$line"
            then
                # Print the line with bad word to stderr
                echo "Line contains bad word: $line" 1>&2

                # Exit from this loop, continue the main one
                continue 2
            fi
        done

        # Save line into the out file
        # This will not be called if line contains bad word
        echo "$line" >> "$out_file"

    # Read from file
    done < "$in_file"
}

Not sure if this is the most efficient solution (might be also possible with sed or awk), but at least this works and is pure Bash with grep only

Edit: if you just want to filter these words without other kind of processing, you can also use grep -o as here:

# Read file into a variable
filtered="$(< "$in_file")"

# Go through each bad word
for word in ${bad_words_list[@]}
do
    # Filter the word
    filtered="$(grep -iv "$word" <<< "$filtered")"
done

# Save final result
echo "$filtered" > "$out_file"
xezo360hye
  • 62
  • 7
  • given that grep is exactly a tool that operates a test condition on each line of a stream, why by God do you kill that capability and make one call for one line ? – Thibault LE PAUL May 27 '23 at 16:43
  • Somehow I probably forgot the easiest `grep -o` solution and concentrated on the line-by-line processing as in the original code (maybe OP wants to do something more than just filtering?). I'll edit the answer – xezo360hye May 28 '23 at 08:10
0

You're overcomplicating things (and should really not use a shell loop to process text)

pets='Dog
Cat
Mouse
Horse'

badword='Stupid
Dumb
Bad'

grep  -Fe "$pets"    < input.txt > pets.txt
grep -vFe "$badword" < input.txt > input-without-badword.txt

Or combining the two:

grep -Fe "$pets" < input.txt |
  grep -vFe "$badword" > pets-without-badword.txt

grep accepts multiple lines as the pattern (or Fixed strings with -F) in which case it looks for any of those line in the input.

If you have to use an array instead of a multi-line string, you can do:

# fish / rc / zsh -o rcexpandparam
grep -F -e$array < input > output

# zsh
grep -F -e$^array < input > output

# mksh / bash / zsh
grep -F "${array[@]/#/-e}" < input > output

# ksh93
grep -F "${array[@]/*/-e\0}" < input > output

Though in mksh / ksh93 / zsh / bash, you can also join the elements of the array with newline with:

IFS=$'\n'
grep -Fe "${array[*]}" < input > output

Or in zsh:

grep -Fe ${(pj[\n])array} < input > output
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501