4

The POSIX wc command counts how many POSIX lines in a file. The POSIX standard defines a line as a text string with the suffix \n. Without \n, a pure text string can't be called a line.

But to me, it's more natural to count how many lines of text string in a file. Is there an easy way to do that?

root:[~]# printf "aa\nbb" | wc -l
1
root:[~]# printf "aa\nbb\n" | wc -l
2
root:[~]#
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
Just a learner
  • 1,766
  • 4
  • 22
  • 32
  • Related: [How to add a newline to the end of a file?](https://unix.stackexchange.com/q/31947/86440) (when the file doesn’t already have one). – Stephen Kitt Aug 13 '19 at 10:00

2 Answers2

7

With GNU sed, you can use:

sed '$=;d'

As GNU sed does consider those extra characters after the last newline as an extra line. GNU sed like most GNU utilities also supports NUL characters in its input and doesn't have a limitation on the length of lines (the two other criteria that make an input non-text as per POSIX).

POSIXLy, building-up on @Inian's answer to support too-long lines and NUL bytes:

LC_ALL=C tr -cs '\n' '[x*]' | awk 'END {print NR}'

That tr command translates all sequences of one or more character (each byte interpreted as a character in the C locale to avoid decoding issues) other than newline to one x character, so awk input records will be either 0 or 1 byte long and its input contain only x and newline characters.

$ printf '%10000s\na\0b\nc\nd' | wc -l
3

$ printf '%10000s\na\0b\nc\nd' | mawk 'END{print NR}'
2
$ printf '%10000s\na\0b\nc\nd' | busybox awk 'END{print NR}'
5
$ printf '%10000s\na\0b\nc\nd' | gawk 'END{print NR}'
4

$ printf '%10000s\na\0b\nc\nd' | LC_ALL=C tr -cs '\n' '[x*]' | mawk 'END{print NR}'
4
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • Under which conditions will `tr -cs '\n' 'x'` fail ? –  Dec 06 '19 at 23:44
  • @Isaac, while `tr -cs '\n' 'x'` would also work with the `tr` of GNU or some BSDs, it is not POSIX as POSIX leaves the behaviour unspecified when the second set (here `x`) is shorter than the first (here the complement of `\n`). It won't work in SysV-derived `tr` implementations for instance. `[x*]` means _as many x as necessary to fill-up the set_. – Stéphane Chazelas Dec 07 '19 at 08:08
4

You can use awk for this which has a special variable NR which tracks the number of current record from the start of the file. The variable gets incremented at the end of each line. When printed at the END block i.e. after all the input lines are processed it prints the number of the last record processed.

printf "aa\nbb" | awk 'END { print NR }'
2

printf "aa\nbb\n" | awk 'END { print NR }'
2
Inian
  • 12,472
  • 1
  • 35
  • 52
  • 2
    Note that with some `awk` implementations, that still implies the input doesn't contain NUL characters (which would also make that input non-text as per POSIX). – Stéphane Chazelas Aug 13 '19 at 08:15