how to find the percent of word appearance in a file

Question

I have a word and I want to check what the percent of its appearance in a file ( according to the total number of word in the file ) ? For example if I have the word "you" and it appears 2 times in a file with 8 words the output will be 25%.

I tried: fgrep -ow

lese · Answer 1 · 2015-11-11T23:32:23.853

2

you can get the total numbers of words in your file as follow

nw=`wc -w < /path/to/file`

And the number of occurrences of a certain word/pattern with

occurrences=`egrep -c <pattern> /path/to/file`

then you can easily calculate the percentage and put the result in a variable

result=`echo "scale=2; $occurrences*100/$nw" | bc`

to add the % you can eg. do as follow

echo $result'%'

edited Nov 11 '15 at 23:32

answered Nov 11 '15 at 10:05

lese

2,716
5
19
30

tnx!! but how i can sdd % near the result? – mor Nov 11 '15 at 21:25
you welcome, it was funny to test, i ll update the answer ; ) – lese Nov 11 '15 at 23:29

score 0 · Answer 2 · edited Apr 13 '17 at 12:36

0

Use the same logic as shown URL

tr ' ' '\n' < file.txt | awk '{if($0=="her"){nmw+=1}}END{print ((nmw*100)/NR)}'

edited Apr 13 '17 at 12:36

Community

1

answered Nov 11 '15 at 10:07

jijinp

1,361
9
10

Assumes all words are separated by spaces. – 123 Nov 11 '15 at 10:31
tnx, but for some reason it not working, it gives me 0 as output. what is "her" ? – mor Nov 11 '15 at 10:35
replace `her` with string you want to search so for your case it is `you`. – jijinp Nov 11 '15 at 11:29

chaos · Answer 3 · 2015-11-11T21:45:35.457

With awk:

awk -vw="word" 'BEGIN{RS="[^a-zA-Z]+"} $0==w{c++} END{printf "%.1f%%\n",c*100/NR}' file

-vw="word" gives awk the variable w which contains "word". That is the word, you want to have the percentage.
BEGIN{RS="[^a-zA-Z]+"} sets the row separator to everything, but letter, so every word is processed separately.
$0==w{c++} increase the counter if the word is found.
END{printf "%.1f%%\n",c*100/NR} print the calculated number after the file is processed

how to find the percent of word appearance in a file

3 Answers3