9

This is part file

N W N N N N N N N N N
N C N N N N N N N N N
N A N N N N N N N N N
N N N N N N N N N N N
N G N N N N N N N N N
N C N N N C N N N N N
N C C N N N N N N N N

In each line I want to count the total number of all characters that are not "N"

my desire output

1
1
1
0
1
2
2
Anna1364
  • 1,006
  • 1
  • 17
  • 33
  • Use ``sed`` to replace stuff you don't care about and ``awk`` to count the remaining length ``sed 's/N//g ; s/\s//g' file | awk '{ print length($0); }'`` – Rolf Oct 10 '17 at 07:19

8 Answers8

13

GNU awk solution:

awk -v FPAT='[^N[:space:]]' '{ print NF }' file
  • FPAT='[^N[:space:]]' - the pattern defining a field value (any character except N char and whitespace)

The expected output:

1
1
1
0
1
2
2
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
9
awk '{ gsub("[ N]",""); print length() }'
Hauke Laging
  • 88,146
  • 18
  • 125
  • 174
7

Another awk approach (will return -1 for empty lines).

awk -F'[^N ]' '$0=NF-1""' infile

Or in complex, it will return -1 on empty lines, 0 on whitespaces (Tabs/Spaces) lines only.

awk -F'[^N \t]+' '$0=NF-1""' infile
αғsнιη
  • 40,939
  • 15
  • 71
  • 114
  • will print `-1` for empty lines... but then that might be desirable to distinguish line made up of only N/space vs empty line... – Sundeep Oct 07 '17 at 04:59
  • 1
    @Sundeep Yes, that's correct. also see my update where lines was only contains Tabs or Spaces to indicate as 0 – αғsнιη Oct 07 '17 at 05:32
7

assuming that count is needed for each line other than space character and N

$ perl -lne 'print tr/N //c' ip.txt 
1
1
1
0
1
2
2
  • return value of tr is how many characters were replaced
  • c to complement the set of characters given
  • Note the use of -l option, strips newline character from input line to avoid off-by-one error and also adds newline character for the print statement


A more generic solution

perl -lane 'print scalar grep {$_ ne "N"} @F' ip.txt 
  • -a option to automatically split input line on white-spaces, saved in @F array
  • grep {$_ ne "N"} @F returns array of all elements in @F which doesn't match the string N
    • regex equivalent would be grep {!/^N$/} @F
  • use of scalar will give number of elements of the array
Sundeep
  • 11,753
  • 2
  • 26
  • 57
6

Alternative awk solution:

awk '{ print gsub(/[^N[:space:]]/,"") }' file
  • gsub(...) - The gsub() function returns the number of substitutions made.

The output:

1
1
1
0
1
2
2
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
5
  1. tr and POSIX shell script:

    tr -d 'N ' < file | while read x ; do echo ${#x} ; done
    
  2. bash, ksh, and zsh:

    while read x ; do x="${x//[ N]}" ; echo ${#x} ; done < file
    
agc
  • 7,045
  • 3
  • 23
  • 53
  • 1
    can use `awk '{print length()}'` to avoid the slower shell looping.. but then one could do it all with awk itself... – Sundeep Oct 07 '17 at 04:54
  • @Sundeep, It's true, (*if* both are started at the same time), that `awk` looping *is* faster than shell looping. But the shell is always in memory, and `awk` might not be -- when `awk` is not already loaded, or swapped out, the overhead of loading it, ([the time lost](https://en.wikipedia.org/wiki/Latency_(engineering))), can be greater than the advantage of running `awk` -- particularly on a small loop. In such cases, (*i.e.* this case), `awk` can be *slower*. – agc Oct 07 '17 at 13:08
  • well, am certainly not worried about time for small stuff... see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice – Sundeep Oct 07 '17 at 13:15
  • 1
    @Sundeep, I *do* worry. Some time ago I used to use [floppy based Linux distros](https://en.wikipedia.org/wiki/Category:Floppy-based_Linux_distributions), which could run off a floppy, in a few megs of ram. Needlessly using `awk` in a shell script could make such a system crawl on all fours. Generally: the same latency drag applies to systems in limited firmware, or any system under heavy load. – agc Oct 07 '17 at 13:33
1

A short combination of tr and awk:

$ tr -d ' N' <file.in | awk '{ print length }'
1
1
1
0
1
2
2

This deletes all spaces an Ns from the input file and awk just prints the length of each line.

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
0

Another easy way is to do it in python, which comes pre-installed in most of unix environments. Drop the following code in a .py file:

with open('geno') as f:
    for line in f:
        count = 0
        for word in line.split():
            if word != 'N':
                count += 1
        print(count)

And then do:

python file.py

From your terminal. What the above does is:

  • for each line in a file named "geno"
  • set a counter to 0 and increment it each time we find a value != 'N'
  • when the end of the current line is reached, print the counter and go to the next line