Count nul delimited items in file

Question

I have a shell script which uses find -print0 to save a list of files to be processed into a temporary file. As part of the logging I'd like to output the number of files found, and so I need a way to get that count. If the -print0 option weren't being used for safety I could use wc -l to get the count.

Related: [How to do `head` and `tail` on null-delimited input in bash?](http://unix.stackexchange.com/q/75186/22565) — Stéphane Chazelas, Oct 30 '14 at 14:06

Stéphane Chazelas · Accepted Answer · 2022-06-16T13:53:05.410

17

Some options:

tr -cd '\0' | wc -c

tr '\n\0' '\0\n' | wc -l       # Generic approach for processing NUL-terminated
                               # records with line-based utilities (that support
                               # NUL characters in their lines like GNU ones).

grep -cz '^'                   # GNU grep

sed -nz '$='                   # recent GNU sed, no output for empty input

awk -v RS='\0' 'END{print NR}' # not all awk implementations

Note that for an input that contains data after the last NUL character (or non-empty input with no NUL characters), the tr approaches will always count the number of NUL characters, but the awk/sed/grep approaches will count an extra record for those extra bytes.

edited Jun 16 '22 at 13:53

answered Oct 30 '14 at 13:42

Stéphane Chazelas

522,931
91
1,010
1,501

I measured these on 5 GB of random data (`head -c 5G /dev/urandom > f`). **Results:** grep 1.7s (same for `grep -Fcz ''`) • tr+wc-c 7.7s • tr+wc-l 7.4s • sed 34.7s • awk 1m11.7s – Socowi Apr 23 '20 at 13:24
@Socowi, YMMV with the implementation and locale. With GNU `awk`, you'll want to set the locale to `C` (or any that doesn't use multibyte characters), `LC_ALL=C awk ... < f` – Stéphane Chazelas Apr 23 '20 at 15:45
Thanks for the hint. I already used `LC_ALL=C` on `sort` where it didn't speed things up, therefore Luckily I have still have the file from before: `LC_ALL=C awk ...` takes 6.7s. – Socowi Apr 23 '20 at 15:50

score 5 · Answer 2 · answered Oct 30 '14 at 13:35

5

The best method I've been able to think of is using grep -zc '.*'. This works, but it feels wrong to use grep with a pattern which will match anything.

answered Oct 30 '14 at 13:35

qqx

2,518
1
18
16

cuonglm · Answer 3 · 2014-11-04T14:47:53.257

2

With perl:

perl -0ne 'END {print $.}'

or:

perl -nle 'print scalar split "\0"'

or:

perl -nle 'print scalar unpack "(Z*)*", $_'

edited Nov 04 '14 at 14:47

answered Oct 30 '14 at 18:31

cuonglm

150,973
38
327
406

The first one will count an extra record if there is data after the last NUL. The 2 other ones don't work if the input contains newline characters. – Stéphane Chazelas Nov 04 '14 at 14:33
@StéphaneChazelas: Oh, my bad. Could you give any improvement? – cuonglm Nov 04 '14 at 14:37
I would just keep the first one, and mention the fact that it counts a non-delimited record (contrary to `wc -l`) as a note (as it may be wanted). – Stéphane Chazelas Nov 04 '14 at 14:50

Count nul delimited items in file

3 Answers3

Linked