How to count the number of characters in a line, except a specific character?

Question

This is part file

N W N N N N N N N N N
N C N N N N N N N N N
N A N N N N N N N N N
N N N N N N N N N N N
N G N N N N N N N N N
N C N N N C N N N N N
N C C N N N N N N N N

In each line I want to count the total number of all characters that are not "N"

my desire output

Use ``sed`` to replace stuff you don't care about and ``awk`` to count the remaining length ``sed 's/N//g ; s/\s//g' file | awk '{ print length($0); }'`` — Rolf, Oct 10 '17 at 07:19

score 13 · Accepted Answer · edited Oct 06 '17 at 21:52

13

GNU awk solution:

awk -v FPAT='[^N[:space:]]' '{ print NF }' file

FPAT='[^N[:space:]]' - the pattern defining a field value (any character except N char and whitespace)

The expected output:

edited Oct 06 '17 at 21:52

Jeff Schaller

66,199
35
114
250

answered Oct 06 '17 at 20:45

RomanPerekhrest

29,703
3
43
67

score 9 · Answer 2 · answered Oct 06 '17 at 20:48

9

awk '{ gsub("[ N]",""); print length() }'

answered Oct 06 '17 at 20:48

Hauke Laging

88,146
18
125
174

can also use `awk '{print gsub(/[^ N]/,"")}'` – Sundeep Oct 07 '17 at 04:47

αғsнιη · Answer 3 · 2017-10-07T05:31:35.000

7

Another awk approach (will return -1 for empty lines).

awk -F'[^N ]' '$0=NF-1""' infile

Or in complex, it will return -1 on empty lines, 0 on whitespaces (Tabs/Spaces) lines only.

awk -F'[^N \t]+' '$0=NF-1""' infile

edited Oct 07 '17 at 05:31

answered Oct 06 '17 at 21:30

αғsнιη

40,939
15
71
114

will print `-1` for empty lines... but then that might be desirable to distinguish line made up of only N/space vs empty line... – Sundeep Oct 07 '17 at 04:59
1

@Sundeep Yes, that's correct. also see my update where lines was only contains Tabs or Spaces to indicate as 0 – αғsнιη Oct 07 '17 at 05:32

Sundeep · Answer 4 · 2017-12-13T04:38:00.047

assuming that count is needed for each line other than space character and N

$ perl -lne 'print tr/N //c' ip.txt 
1
1
1
0
1
2
2

return value of tr is how many characters were replaced
c to complement the set of characters given
Note the use of -l option, strips newline character from input line to avoid off-by-one error and also adds newline character for the print statement

A more generic solution

perl -lane 'print scalar grep {$_ ne "N"} @F' ip.txt

-a option to automatically split input line on white-spaces, saved in @F array
grep {$_ ne "N"} @F returns array of all elements in @F which doesn't match the string N
- regex equivalent would be grep {!/^N$/} @F
use of scalar will give number of elements of the array

score 6 · Answer 5 · answered Oct 06 '17 at 21:05

6

Alternative awk solution:

awk '{ print gsub(/[^N[:space:]]/,"") }' file

gsub(...) - The gsub() function returns the number of substitutions made.

The output:

answered Oct 06 '17 at 21:05

RomanPerekhrest

29,703
3
43
67

agc · Answer 6 · 2017-10-07T13:39:59.630

5

tr and POSIX shell script:

tr -d 'N ' < file | while read x ; do echo ${#x} ; done

bash, ksh, and zsh:

while read x ; do x="${x//[ N]}" ; echo ${#x} ; done < file

edited Oct 07 '17 at 13:39

answered Oct 07 '17 at 02:19

agc

7,045
3
23
53

1

can use `awk '{print length()}'` to avoid the slower shell looping.. but then one could do it all with awk itself... – Sundeep Oct 07 '17 at 04:54
@Sundeep, It's true, (*if* both are started at the same time), that `awk` looping *is* faster than shell looping. But the shell is always in memory, and `awk` might not be -- when `awk` is not already loaded, or swapped out, the overhead of loading it, ([the time lost](https://en.wikipedia.org/wiki/Latency_(engineering))), can be greater than the advantage of running `awk` -- particularly on a small loop. In such cases, (*i.e.* this case), `awk` can be *slower*. – agc Oct 07 '17 at 13:08
well, am certainly not worried about time for small stuff... see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Sundeep Oct 07 '17 at 13:15
1

@Sundeep, I *do* worry. Some time ago I used to use [floppy based Linux distros](https://en.wikipedia.org/wiki/Category:Floppy-based_Linux_distributions), which could run off a floppy, in a few megs of ram. Needlessly using `awk` in a shell script could make such a system crawl on all fours. Generally: the same latency drag applies to systems in limited firmware, or any system under heavy load. – agc Oct 07 '17 at 13:33

score 1 · Answer 7 · answered Oct 08 '17 at 08:30

1

A short combination of tr and awk:

$ tr -d ' N' <file.in | awk '{ print length }'
1
1
1
0
1
2
2

This deletes all spaces an Ns from the input file and awk just prints the length of each line.

answered Oct 08 '17 at 08:30

Kusalananda

320,670
36
633
936

score 0 · Answer 8 · answered Oct 07 '17 at 11:15

Another easy way is to do it in python, which comes pre-installed in most of unix environments. Drop the following code in a .py file:

with open('geno') as f:
    for line in f:
        count = 0
        for word in line.split():
            if word != 'N':
                count += 1
        print(count)

And then do:

python file.py

From your terminal. What the above does is:

for each line in a file named "geno"
set a counter to 0 and increment it each time we find a value != 'N'
when the end of the current line is reached, print the counter and go to the next line

How to count the number of characters in a line, except a specific character?

8 Answers8