1

How to grep number of occurrence of two different words e.g. 'register' and 'evn' in a file on Linux ?

The output should be like following:

registered:20
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Supriya
  • 31
  • 1
  • 2
  • 3
    You should clarify the requirements. 1) Multiple occurrences of a word on the same line should be counted as 1 (as in [McNisse's answer](http://unix.stackexchange.com/a/60738)) or should count each (as in [my answer](http://unix.stackexchange.com/a/60739))? 2) All words with the same base should be counted (as in [dchirikov's answer](http://unix.stackexchange.com/a/60729)) or only exact matches (as in [my answer](http://unix.stackexchange.com/a/60739)). – manatwork Jan 09 '13 at 11:37

5 Answers5

5

In case reversed output format (count first, word after) is also acceptable, this does it too and is easy to add more words:

tr -c '[:alpha:]' '\n' < /path/to/file | sort | uniq -c | grep -w 'register\|evn'
  • Counts each word occurrence, even if there are multiple occurrences in the same line.
  • Counts exact matches of the words, not including the suffixed variants.
manatwork
  • 30,549
  • 7
  • 101
  • 91
4

Use awk

awk '/register/ {r++} /evn/ {e++} END {printf("register:%d\nevn:%d\n", r, e)}' /path/to/file 
McNisse
  • 591
  • 3
  • 5
3

You can calculate it separately:

$ word=register; count=`grep -o $word /path/to/file| wc -l`; echo $word:$count
$ word=evn; count=`grep -o $word /path/to/file| wc -l`; echo $word:$count
dchirikov
  • 3,818
  • 2
  • 15
  • 18
  • 3
    You don't need to `wc -l`. `grep -c` gives the count directly. – McNisse Jan 09 '13 at 11:35
  • @McNisse Actually, you do because grep will only count line occurrences and there may be more than one occurrence of a word in a line. – mchid Oct 16 '17 at 13:41
1

example file ./filename:

registering evn register evn
evn register evn.register. register.evn evn evn register. 
evn register-evnt register

command:

echo register:$(grep -oP "(^|\s)\Kregister(?=\s|$)|(^|\s)\Kregister\.(?=\s|$)" ./filename | wc -l) && echo evn:$(grep -oP '(^|\s)\Kevn(?=\s|$)|(^|\s)\Kevn\.(?=\s|$)' ./filename | wc -l)

example output:

register:4
evn:6

This should accurately count only the words "register" and "evn" while omitting occurrences of words containing "register" and or "evn" such as "registering", "evnt", or "register-evn" for example.

This assumes that there are no special characters like dashes immediately following either word but will include these words if they are followed by a period at the end of a line or sentence.

This linked answer gave me the info I needed for the grep syntax.

mchid
  • 1,421
  • 2
  • 15
  • 21
-1
word="registered"
echo $word:$( grep -wc $word /path/to/file )

Works with Bash/Ksh and GNU grep

Soumyadip DM
  • 374
  • 1
  • 2
  • 8