How to grep number of occurence of two different words in a file on Linux?

Question

How to grep number of occurrence of two different words e.g. 'register' and 'evn' in a file on Linux ?

The output should be like following:

registered:20

You should clarify the requirements. 1) Multiple occurrences of a word on the same line should be counted as 1 (as in [McNisse's answer](http://unix.stackexchange.com/a/60738)) or should count each (as in [my answer](http://unix.stackexchange.com/a/60739))? 2) All words with the same base should be counted (as in [dchirikov's answer](http://unix.stackexchange.com/a/60729)) or only exact matches (as in [my answer](http://unix.stackexchange.com/a/60739)). — manatwork, Jan 09 '13 at 11:37

manatwork · Answer 1 · 2013-01-09T11:43:16.580

5

In case reversed output format (count first, word after) is also acceptable, this does it too and is easy to add more words:

tr -c '[:alpha:]' '\n' < /path/to/file | sort | uniq -c | grep -w 'register\|evn'

Counts each word occurrence, even if there are multiple occurrences in the same line.
Counts exact matches of the words, not including the suffixed variants.

edited Jan 09 '13 at 11:43

answered Jan 09 '13 at 11:35

manatwork

30,549
7
101
91

McNisse · Answer 2 · 2013-01-09T11:35:42.947

4

Use awk

awk '/register/ {r++} /evn/ {e++} END {printf("register:%d\nevn:%d\n", r, e)}' /path/to/file

edited Jan 09 '13 at 11:35

answered Jan 09 '13 at 11:30

McNisse

591
3
5

dchirikov · Answer 3 · 2013-01-09T10:46:37.020

3

You can calculate it separately:

$ word=register; count=`grep -o $word /path/to/file| wc -l`; echo $word:$count
$ word=evn; count=`grep -o $word /path/to/file| wc -l`; echo $word:$count

edited Jan 09 '13 at 10:46

answered Jan 09 '13 at 10:39

dchirikov

3,818
2
15
18

3

You don't need to `wc -l`. `grep -c` gives the count directly. – McNisse Jan 09 '13 at 11:35
@McNisse Actually, you do because grep will only count line occurrences and there may be more than one occurrence of a word in a line. – mchid Oct 16 '17 at 13:41

mchid · Answer 4 · 2017-10-16T16:29:56.277

example file ./filename:

registering evn register evn
evn register evn.register. register.evn evn evn register. 
evn register-evnt register

command:

echo register:$(grep -oP "(^|\s)\Kregister(?=\s|$)|(^|\s)\Kregister\.(?=\s|$)" ./filename | wc -l) && echo evn:$(grep -oP '(^|\s)\Kevn(?=\s|$)|(^|\s)\Kevn\.(?=\s|$)' ./filename | wc -l)

example output:

register:4
evn:6

This should accurately count only the words "register" and "evn" while omitting occurrences of words containing "register" and or "evn" such as "registering", "evnt", or "register-evn" for example.

This assumes that there are no special characters like dashes immediately following either word but will include these words if they are followed by a period at the end of a line or sentence.

This linked answer gave me the info I needed for the grep syntax.

score -1 · Answer 5 · answered Jan 09 '13 at 22:21

-1

word="registered"
echo $word:$( grep -wc $word /path/to/file )

Works with Bash/Ksh and GNU grep

answered Jan 09 '13 at 22:21

Soumyadip DM

374
1
2
8

How to grep number of occurence of two different words in a file on Linux?

5 Answers5

Linked