Bash script to find maximum number of a certain character (".") in any single line of a file

Question

There is a file with an unknown number of lines. In the file each line contains unknown many periods (.).

How can I find the maximum period number? I am not interested in finding the line that contains the most periods.

For example: Processing the file content below in bash should give the answer "4".

one.one
two.two.two
three.three.three.three
four..four.
five..five..
six...six

Quite related: [How to count the number of a specific character in each line?](https://unix.stackexchange.com/q/18736). Take an answer there, sort the results and get the last line, you have the answer. — Quasímodo, Jul 23 '20 at 12:25
`tr -dc '\n.' | sort | tail -n1 | wc -m` https://stackoverflow.com/q/8629410 — alecxs, Jul 24 '20 at 07:11

score 3 · Answer 1 · answered Jul 23 '20 at 09:25

You could do it with awk:

awk '{gsub(/[^.]/,""); len=length(); if (len>max) {max=len}} END{printf("Largest count of \".\": %d\n",max)}' file.txt

This will, for every line, replace all characters that are not ., by "nothing" (i.e. remove everything that is not a .). Then, it will count the length of the remaining string, and store the largest value found in max. At end-of-file, it will print the result.

score 3 · Answer 2 · answered Jul 23 '20 at 10:08

Alternatively, you can count the number of a specific character, and leave the text unchanged for further processing, such as printing the line itself, or counting another character. gsub returns the number of replacements.

awk '{ nDot = gsub ("[.]", "."); etc .. }'

score 3 · Answer 3 · answered Jul 23 '20 at 10:08

3

The awk-less answer:

sed 's/[^.]//g' test.dat | wc -L

In other words, keep only the dots, and use the -L option of wc: -L, --max-line-length: print the maximum display width

answered Jul 23 '20 at 10:08

xenoid

8,648
1
24
47

2

Note that `wc -L` is a GNU extension. – Stéphane Chazelas Jul 23 '20 at 11:24

roaima · Answer 4 · 2020-07-23T13:12:03.697

2

Let's generate an example,

cat >file <<'X'
this.world.
this
1.2.3.4.5
all.is.done
X

With perl

perl -e 'while (<>) { $x = $n if ($n = ($_ =~ y/.//)) > $x } print "$x\n"' file
4

With awk

awk '{ gsub("[^.]", ""); if ((n = length($0)) > x) { x = n } } END { print x }' file
4

With tr and a non-POSIX extended version of wc

tr -cd '.\n' <file | wc -L
4

edited Jul 23 '20 at 13:12

answered Jul 23 '20 at 10:41

roaima

107,089
14
139
261

The stderr output format of `dd` is only specified in the POSIX locale, and even there, all it says is it shall be `"%u+%u records out\n", , ` (note that leading blanks are also allowed). GNU `dd` doesn't appear to be compliant in that regard. – Stéphane Chazelas Jul 23 '20 at 11:33
And DD reports bytes and not characters, so if you to generalize to any character it won't work. Only the `awk` and the `wc -L` version will work on characters coded in more than one byte. – xenoid Jul 23 '20 at 13:07
Ok. Option removed. Thank you both – roaima Jul 23 '20 at 13:11
1

The version with `tr` and `wc -L` works OK for me (at least with French characters, assuming UTF-8 encoded input file). – xenoid Jul 23 '20 at 13:17
1

In UTF-8, bytes with a `0` upper bit can only be 1-byte characters, bytes of multi-bytes characters always have a `1` upper bit, so the ASCII for `.` cannot match a byte of a multi-byte character. – xenoid Jul 23 '20 at 13:21
1

@xenoid, GNU `wc -L` reports the display width, not the number of characters. See [Get the display width of a string of characters](//unix.stackexchange.com/a/258551) – Stéphane Chazelas Jul 23 '20 at 18:12

Rakesh Sharma · Answer 5 · 2020-07-24T02:22:32.027

One way with awk could be as follows. We need to realize that the following equality holds:

number of fields = number of delimiters + 1

Note that adding a 0 to the operand in arithmetic comparison, even though not always necessary, is a good practice to inculcate. At least it helps me think about one less thing, for it becomes an auto reflex coding action. Since Awk does not provide separate operators for arithmetic nd string comparisons, hence coercion is needed to help disambiguate a string from a math operand or rather context.

$ awk -F '[.]' '
    NF>m+0 {m=NF}
    END {print --m}
' file
4

$ awk '
    gsub(/[^.]+/, "") &&
    ! index(t, $0) { t = $0 }
    END { print length(t) }
' file

$ perl -lne '
    my $k = tr/.//;
    $k > $m and $m = $k;
    }{ print $m+0;
' file

The GNU sed editor can also be used in conjunction with the binary calculator bc utility. Idea is we keep lines stripped off of all non-dots and the current longest string of pure dots is held in hold. At eof, we transform the dots into an actionable bc code to generate the number of those dots.

$ sed -Ee '
    s/[^.]+//g;G
    /^(.*)..*\n\1$/!ba
    s/\n.*//;h;:a
    $!d;g;s/./1+/g;s/$/0/
'  file | bc -l

Could you please add an explanation? And is m+0 really needed there? — Quasímodo, Jul 23 '20 at 12:27

score 0 · Answer 6 · answered Jul 24 '20 at 09:53

0

JAAOV (Just another awk obfuscating variant...)

awk 'gsub(/[^.]/,"") { print | "wc -L" }'

answered Jul 24 '20 at 09:53

JJoao

11,887
1
22
44

Bash script to find maximum number of a certain character (".") in any single line of a file

6 Answers6