2

I have to find how many times the word shell is used in a file. I used grep "shell" test.txt | wc -w in order to count how many times that word has been used, but the result comes out 4 instead of 3. The file content is:

this is a test file
for shell_A
shell_B
sh
shel
and 
shell_C
script project
Braiam
  • 35,380
  • 25
  • 108
  • 167
J.Doe
  • 47
  • 4

4 Answers4

18

The wc command is counting the words in the output from grep, which includes "for":

> grep shell test.txt
for shell_A
shell_B
shell_C

So there really are 4 words.

If you only want to count the number of lines that contain a particular word in a file, you can use the -c option of grep, e.g.,

grep -c shell test.txt

Neither of those actually count words, but could match other things which include that string. Most implementations of grep (GNU grep, modern BSDs as well as AIX, HPUX, Solaris) provide a -w option for words, however that is not in POSIX. They also recognize a regular expression, e.g.,

grep -e '\<shell\>' test.txt

which corresponds to the -w option. Again, that is not in POSIX. Solaris does document this, while AIX and HPUX describe -w without mentioning the regular expression. These all appear to be consistent, treating a "word" as a sequence of alphanumerics plus underscore.

You could use a POSIX regular expression with grep to match words (separated by blanks, etc), but your example has none which are just "shell": they all have some other character touching the matches. Alternatively, if you care only about alphanumerics (and no underscore) and do not mind matching substrings, you could do

tr -c '[[:alnum:]]' '\n' test.txt |grep -c shell

The -o option suggested is non-POSIX, and since OP did not limit the question to Linux or BSDs, is not what I would recommend. In either case, it does not match words, but strings (which was OP's expectation).

For reference:

Thomas Dickey
  • 75,040
  • 9
  • 171
  • 268
  • That's what I was thinking too, but I tried to count a word that wasn't in the file, thinking it would output a 1 but it said 0 so I didn't give it much thought after. Any advice on how to fix it? – J.Doe Feb 06 '16 at 16:24
  • 2
    And you'd have to be careful with (theoretical) input lines like "shell shell" – Jeff Schaller Feb 06 '16 at 20:56
  • 1
    Explanation for wrong answer by "wc" is correct. Solution will not work if text contains "shell_shell shell_shell" , for which "grep -c" will incorrectly count 1. Only "grep -o" seems to be the best ! – Prem Feb 07 '16 at 17:45
16

The command 'grep' is outputting the entire lines that "shell" appear on. Not just the word "shell." As can be seen below:

grep shell test.txt
for shell_A
shell_B
shell_C

I would recomend using the option

-o, --only-matching

So:

grep -o "shell" test.txt | wc -w
terdon
  • 234,489
  • 66
  • 447
  • 667
Dylan
  • 1,018
  • 2
  • 10
  • 19
5

since you can have the word "shell" multiple times on a line I would start with breaking up the text in single words per line and then do the grep

< test.txt tr -s "[[:blank:]]" "\n" | grep "shell" | wc -w

you can also use wc -l, or do away with wc and use grep -c "shell"

And you can even remove the need for tr on the file that you have and use:

grep -c "shell" test.txt

Anthon
  • 78,313
  • 42
  • 165
  • 222
  • 2
    I find your answer erotic and love it. But for someone new to shell scripting that involves a lot of complex subjects. Still got my vote – Dylan Feb 06 '16 at 16:32
  • I should have thought of grep -c. I tried grep | c but i forgot c itself is not a command. I don't think I can get away with the other commands, those are not in the instructions. – J.Doe Feb 06 '16 at 16:33
  • 3
    @J.Doe Then leave out the `tr` as I showed in my update answer. `grep -c "shell" test.txt` gives you 3, but only because there are no double "shell"s on a line – Anthon Feb 06 '16 at 16:37
  • Does not work if text contains "shell_shell shell_shell" , because only blanks are converted to "\n". Even "grep -c" will be incorrect. Only "grep -o" seems to be the best. – Prem Feb 07 '16 at 17:33
1

You should use wc -l for that, i.e. grep shell test.txt | wc -l. That returns 3.

Tolga Ozses
  • 63
  • 2
  • 8
  • This suffers from the same issue as '-c' of not counting the occurrences and only counts the lines. If "shell" appears two times or more on a line the answer will be incorrect – Dylan Feb 07 '16 at 18:11