12

I have a file with blank lines at the end of the file. Can I use grep to count the number of blank lines at the end of the file with the file name being passed as variable in the script?

don_crissti
  • 79,330
  • 30
  • 216
  • 245
  • *to count the number of **consecutive** blank lines*? – RomanPerekhrest Nov 30 '17 at 09:47
  • 3
    @RomanPerekhrest I'd say so, otherwise they wouldn't be "at the end of the file"? – Sparhawk Nov 30 '17 at 09:56
  • 'grep -cv -P '\S' filename' will count the total number of blank lines in the file. The number at the end only is taxing my brain! – MichaelJohn Nov 30 '17 at 10:11
  • OP asked for `grep` @MichaelJohn wins for purity in my book. – bu5hman Nov 30 '17 at 10:26
  • 2
    @bu5hman But (as he admits) doesn't answer the question. Nor does yours, really. – Sparhawk Nov 30 '17 at 10:49
  • @Sparhawk I think we both do, conditional on there being no other blank lines in the file. If there are other blank lines, then the answer to the OP actual question (Can I use `grep`....) is actually no, you can't, at least not on its own. – bu5hman Nov 30 '17 at 12:47
  • @MichaelJohn I couldn't stop thinking of how to use `grep` for counting empty lines at the end only and came up with a solution (see answer below) – Philippos Nov 30 '17 at 13:12
  • @bu5hman But that's a condition that is not mentioned in the question. `echo 3` also answers the question, conditional on there being three blank lines at the end. – Sparhawk Nov 30 '17 at 20:34
  • @Sparhawk I beg to differ. OP said "Can I use `grep` to .....". Anyway, this simple question provoked some creative thought and got a bit of friendly competition going. Whatever else, it was fun to do. – bu5hman Dec 01 '17 at 03:36

8 Answers8

11

If the blank lines are only at the end

grep  -c '^$' myFile

or:

grep -cx '' myFile
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
bu5hman
  • 4,663
  • 2
  • 14
  • 29
11

Just for fun, some spooky sed:

#!/bin/sh
sed '/./!H;//h;$!d;//d;x;s/\n//' "$1" | wc -l

Explanation:

  • /./ addresses lines with any character, so /./! addresses empty lines (like /^$/, but I want to reuse the opposite pattern); for those, the H command append them to the hold space. Thus, if for each empty line we have added one line to the hold space, there is always one more line than the number of empty lines. We'll care for that later.
  • //h the empty pattern matches the last regular expression , which was any character, so any non-empty line is addressed and moved to the hold space by the h command to "reset" the collected lines to 1. When the next empty line will get appended, there will be two again, as expected.
  • $!d stops the script without output for every but the last line, so further commands are only executed after the last line. So whatever empty lines we collected in the hold space are at the end of the file. Good.
  • //d: The d command is again executed for non-empty lines only. So if the last line was not empty, sed will exit without any output. Zero lines. Good.
  • x exchanges hold space and pattern space, so the collected lines are in the pattern space now to be processed.
  • But we remember that there is one line too much, so we reduce it by removing one newline with s/\n//.
  • Voilà! The number of lines matches the number of empty lines at the end (note that the first line will not be empty, but who cares), so we can count them with wc -l.
Philippos
  • 13,237
  • 2
  • 37
  • 76
8

Some more GNU tac/tail -r options:

tac file | awk 'NF{exit};END{print NR?NR-1:0}'

Or:

tac file | sed -n '/[^[:blank:]]/q;p' | wc -l

Note that on the output of:

printf 'x\n '

That is, where there is an extra space after the last full line (which some could consider as an extra blank line, but by the POSIX definition of text, is not valid text), those would give 0.

POSIXly:

awk 'NF{n=NR};END{print NR-n}' < file

but that means reading the file in full (tail -r/tac would read the file backward from the end on seekable files). That gives 1 on the output of printf 'x\n '.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
6

Another awk solution. This variation resets the counter k each time there is a non-blank line. Then, every line increments the counter. (So, after the first non-blank length line, k==0.) At the end we output the number of lines we have counted.

Prepare the data file

cat <<'X' >input.txt
aaa

bbb
ccc



X

Count the trailing blank lines in the sample

awk 'NF {k=-1}; {k++}; END {print k+0}' input.txt
3

In this definition, a blank line might contain spaces or other blank characters; it's still blank. If you really want to count empty lines rather than blank lines, change NF for $0 != "".

roaima
  • 107,089
  • 14
  • 139
  • 261
  • Why `$0 > ""`? That uses `strcoll()` which would be less efficient than `$0 != ""` which uses `memcmp()` in many implementations (POSIX used to require it to use `strcoll()` though). – Stéphane Chazelas Nov 30 '17 at 11:23
  • @StéphaneChazelas I've not considered that `$0 > ""` might be different to `$0 != ""`. I tend to treat `awk` as a "slow" operator anyway (such that if I know I've got a large dataset as input and the processing is time critical, I'll see what I can do to reduce the amount `awk` has to process - I have used `grep | awk` constructs in such situations). However, having had a quick look at what I assume is the [POSIX definition](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) I can't see any reference to either `strcoll()` or `memcmp()`. What am I missing? – roaima Nov 30 '17 at 13:14
  • `strcoll()` == _the strings shall be compared using the locale-specific collation sequence_. Compare with the [previous edition](http://pubs.opengroup.org/onlinepubs/9699919799.2013edition/utilities/awk.html). I was the one bringing it up. See also http://austingroupbugs.net/view.php?id=963 – Stéphane Chazelas Nov 30 '17 at 13:40
  • @StéphaneChazelas an implementation where `a <= b && a >= b` is not necessarily the same as `a == b`. Ouch! – roaima Nov 30 '17 at 16:49
  • That's the case of GNU `awk` or `bash` (for its `[[ a < b ]]` operators) in en_US.UTF-8 locales on GNU systems for instance for `①` vs `②` for instance (for `bash`, none of `<`, `>`, `=` return true for those). Arguably it's a bug in the definition of those locales more than in bash/awk – Stéphane Chazelas Nov 30 '17 at 16:59
6

As you are actually asking for a grep solution I add this one relying only on GNU grep (okay, also using shell syntax and echo ...):

#!/bin/sh
echo $(( $(grep -c "" "$1") - $(grep -B$(grep -cv . "$1") . "$1" |grep -c "") ))

What am I doing here? $(grep -c ".*" "$1") counts all lines in the file, then we substract the file without the trailing empty lines.

And how to get those? $(grep -B42 . "$1" would grep all non-empty-lines and 42 lines before them, so it would print everything until the last non-empty line, as long as there are not more than 42 consecutive empty lines before a non-empty line. To avoid that limit, I take $(grep -cv . "$1") as the parameter for the -B option, which is the total number of empty lines, so always big enough. This way I have stripped the trailing empty lines and can use |grep -c ".*" to count the lines.

Brilliant, isn't it? (-;

Philippos
  • 13,237
  • 2
  • 37
  • 76
  • +1 because although that's horrible code, it technically answers the question as asked and I can't bear to mark you down ;-) – roaima Nov 30 '17 at 13:17
  • Grepmeister. We are not worthy. – bu5hman Nov 30 '17 at 13:18
  • +1 for the perversity. Another (possibly faster?) option would be to `tac | grep` to the first non-blank with `-m -A 42`, then minus one. I'm not sure which is more efficient, but you could also `wc -l | cut -d' ' -f1` instead of grepping the blank lines? – Sparhawk Nov 30 '17 at 21:09
  • Yes, sure, you can do a lot of things with `tac`, `wc` and `cut`, but here I tried to restrict myself to `grep`. You can call it perversity, I call it sports. (-; – Philippos Dec 01 '17 at 07:09
2

to count the number of consecutive blank lines at the end of the file

Solid awk + tac solution:

Sample input.txt:

$ cat input.txt
aaa

bbb
ccc



$  # command line 

The action:

awk '!NF{ if (NR==++c) { cnt++ } else exit }END{ print int(cnt) }' <(tac input.txt)
  • !NF - ensures the current line is empty (has no fields)
  • NR==++c - ensuring the consecutive order of blank lines. (NR - record number, ++c - evenly incremented auxiliary counter)
  • cnt++ - counter of blank lines

The output:

3
RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67
1

IIUC, the following script called count-blank-at-the-end.sh would do the job:

#!/usr/bin/env sh

count=$(tail -n +"$(grep . "$1" -n | tail -n 1 | cut -d: -f1)" "$1" | wc -l)
num_of_blank_lines=$((count - 1))

printf "%s\n" "$num_of_blank_lines"

Example usage:

$ ./count-blank-at-the-end.sh FILE
4

I tested it in GNU bash, Android mksh and in ksh.

Arkadiusz Drabczyk
  • 25,049
  • 5
  • 53
  • 68
0

Alternative Python solution:

Sample input.txt:

$ cat input.txt
aaa

bbb
ccc



$  # command line 

The action:

python -c 'import sys, itertools; f=open(sys.argv[1]);
lines=list(itertools.takewhile(str.isspace, f.readlines()[::-1]));
print(len(lines)); f.close()' input.txt

The output:

3

https://docs.python.org/3/library/itertools.html?highlight=itertools#itertools.takewhile

RomanPerekhrest
  • 29,703
  • 3
  • 43
  • 67