How do I get the character count of words in a particular column?

Question

I have a CSV file like this:

abd,123,egypt,78
cde,456,england,45

How can I get the character count of only the 3rd column words?

I can't figure out how to get wc to do this.

score 23 · Answer 1 · edited May 07 '14 at 12:27

23

awk -F, '{sum+=length($3)}; END {print +sum}' file

edited May 07 '14 at 12:27

Stéphane Chazelas

522,931
91
1,010
1,501

answered May 07 '14 at 11:56

Hauke Laging

88,146
18
125
174

3

Amen; `awk` was designed for processing column based files, line-by-line. The problem is perfectly suited for the tool. – Ray May 07 '14 at 13:34
What is the purpose of + in {print +sum} ? {print sum} works just as well. – spuder May 07 '14 at 16:18
3

@spuder, that's to print `0` instead of an empty line when the input file is empty. – Stéphane Chazelas May 07 '14 at 17:09
3

@Ray, on the other hand, the task can be achieved by having 3 basic utilities (each one of them being a fraction of the size of `awk`) cooperating to the case (working concurrently) in typical Unix spirit. You may notice how the cut+tr+wc one is 5 types as fast as this awk one itself 5 times as fast as the `perl` one. (at least on my system, in a UTF8 locale, tried on a 100MB file). – Stéphane Chazelas May 08 '14 at 06:00

Stéphane Chazelas · Answer 2 · 2014-05-07T12:07:14.917

23

cut -d, -f3 | tr -d '\n' | wc -m

(remember that wc -c counts bytes, not characters:

$ echo a,1,españa,2 | cut -d, -f3 | tr -d '\n' | wc -c
7
$ echo a,1,españa,2 | cut -d, -f3 | tr -d '\n' | wc -m
6

)

edited May 07 '14 at 12:07

answered May 07 '14 at 11:58

Stéphane Chazelas

522,931
91
1,010
1,501

But he specifies 'I am not able to use `wc` command to get output!' – mikeserv May 07 '14 at 13:37
3

@mikeserv, which I interpret as _I wasn't able to get `wc` to give me the character count_ which is why I show how to use `wc` in this context. – Stéphane Chazelas May 07 '14 at 13:40
Oh.... That is a *very* valid interpretation which never at all occurred to me... – mikeserv May 07 '14 at 13:41

cuonglm · Answer 3 · 2014-05-07T13:43:55.287

5

A perl solution:

perl -Mopen=:locale -F, -anle '$sum += length($F[2]); END{print $sum}' file

or a shorter version:

perl -Mopen=:locale -F, -anle '$sum += length($F[2])}{print $sum' file

edited May 07 '14 at 13:43

answered May 07 '14 at 12:11

cuonglm

150,973
38
327
406

Note that it returns a byte count, not necessarily a character count. – Stéphane Chazelas May 07 '14 at 12:36
@StephaneChazelas: length() return the logical characters count, not physical bytes according to perldoc. – cuonglm May 07 '14 at 12:50
But you need `-Mopen=:locale` for `perl` to use the user/system's definition of what a character is, otherwise it assumes characters are bytes. Try on a `a,1,españa,2` input in a UTF-8 locale (the default on most systems). – Stéphane Chazelas May 07 '14 at 13:05
@StephaneChazelas: Oh, updated my answer. Thanks for good point! – cuonglm May 07 '14 at 13:45

Joseph R. · Answer 4 · 2014-05-07T13:47:00.683

3

In Perl:

perl -F, -Mopen=:locale -lane 'print length $F[2]' your_file

edited May 07 '14 at 13:47

answered May 07 '14 at 11:44

Joseph R.

38,849
7
107
143

score 3 · Answer 5 · answered May 07 '14 at 11:48

3

cut -d, -f3 <<\DATA | grep -o . | grep -c .
abd,123,egypt,78
cde,456,england,45
DATA

#OUTPUT
12

answered May 07 '14 at 11:48

mikeserv

57,448
9
113
229

score 3 · Answer 6 · edited May 07 '14 at 12:25

3

You could also use

awk -F, '{printf "%s", $3}' file | wc -m

edited May 07 '14 at 12:25

Stéphane Chazelas

522,931
91
1,010
1,501

answered May 07 '14 at 12:22

terdon

234,489
66
447
667

score 1 · Answer 7 · answered May 07 '14 at 12:12

With your sample file like so:

$ cat sample.txt 
abd,123,egypt,78
cde,456,england,45

$ awk -F, '{print $3}' sample.txt | while read i; do echo "$i" | \
    tr -d '\n' | wc -m; done
5
7

Working with wc to get each line's count can be tricky. You have to call it for each string from column 3 individually which makes it a bit tricky to do what you want. You have to look through each row of your CSV, extract column 3 and then present it to wc to get the character count.

score 0 · Answer 8 · answered May 09 '14 at 10:35

Using sed and awk

sed 's/.*,.*,\(.*\),.*/\1/g' file | awk -v FS="" '{print NF;}'

Example:

$ (echo abd,123,egypt,78; echo cde,456,england,45;) | sed 's/.*,.*,\(.*\),.*/\1/g' | awk -v FS="" '{print NF;}'
5
7

Two awk's

awk -F, '{print $3}' file | awk -v FS="" '{print NF;}'

Example:

$ (echo abd,123,egypt,78; echo cde,456,england,45;) | awk -F, '{print $3}'| awk -v FS="" '{print NF;}'
5
7

How do I get the character count of words in a particular column?

8 Answers8