5

I have dozens of values in a file such as

(1608926678.237962) vcan0 123#0000000158
(1608926678.251533) vcan0 456#0000000186

I want to count how many of each there are based on the numbers before the hash symbol (can include it also)

I have tried to following but keep getting zero

 grep -o '\b\d+#\b' ./file.log | wc -l

Any ideas? For the above example I would want:

123# 1
456# 1
αғsнιη
  • 40,939
  • 15
  • 71
  • 114
pee2pee
  • 183
  • 2
  • 6
  • 2
    Neither `\d` nor the `+` qualifier are supported by BRE grep - see for example [Why does my regular expression work in X but not in Y?](https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y) – steeldriver Dec 21 '20 at 19:40

5 Answers5

4

It's not exactly the output you described but if that is really a hard requirement it can be massaged to that format but:

awk -F'[ #]' '{print $3}' input | sort -n | uniq -c

The awk command will extract your number before # and then pass it to sort/uniq. uniq -c will provide a count of each value.


To get your output format:

awk -F'[ #]' '{print $3}' input | sort -n | uniq -c | awk '{print $2"#",$1}'
jesse_b
  • 35,934
  • 12
  • 91
  • 140
4

grep + Bash:

$ grep -Eo '\b[0-9]+#\b' ./file.log  | sort | uniq -c  | while read -r a b; do echo "$b" "$a"; done
123# 1
456# 1
Arkadiusz Drabczyk
  • 25,049
  • 5
  • 53
  • 68
  • 1
    That while loop is just `awk '{print $2, $1}`, and I’m sure there are options with other tools. Why write a loop you don’t need? – D. Ben Knoble Dec 22 '20 at 14:24
  • First, you forgot `'` and second - why not? There are many ways to do what OP requested. This solution uses Bash, other solutions use `awk` which was added to the list of tags after OP asked the question - see https://unix.stackexchange.com/posts/625570/revisions – Arkadiusz Drabczyk Dec 22 '20 at 14:27
  • Well, if we’re nit-picking to that level, your answer isn’t grep + bash either, since you use sort and uniq. I am in favor of not using a while-read loop in bash where possible—they tend to be slower than the equivalent approach using a dedicated tool. And since you already used a few other tools, as mentioned, there’s no harm in throwing another (awk) into the mix for the field re-writing. – D. Ben Knoble Dec 22 '20 at 14:29
  • Yes, I'm the one who's *nit-picking* :) Have a nice day. – Arkadiusz Drabczyk Dec 22 '20 at 14:32
  • 1
    The while loop is much slower than an awk equivalent, but more importantly, I don't understand why you would want it. What does it offer that `unic -c` doesn't do already? If you just want to change `1 123#` to `123# 1`, then using a shell loop is probably the most inefficient and slow way of doing it, so it seems like an odd choice. – terdon Dec 22 '20 at 15:23
  • As for ["why not?"](https://unix.stackexchange.com/questions/625570/group-and-count-by-a-regex#comment1170807_625572) - see [why-is-using-a-shell-loop-to-process-text-considered-bad-practice](https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) – Ed Morton Dec 22 '20 at 15:25
4

With GNU awk:

awk -v FPAT=' [0-9]+#' '{ c[$1]++; }; END{ for(x in c) print x, c[x]; }' infile
 123# 1
 456# 1

Assuming there is always one pattern " [0-9]+#" matched per line as shown in your given sample input;


to filtering out the whitespaces from the result and also during processing for a input like:

(1608926678.237962) vcan0        123#0000000158
(1608926678.251533) vcan0 456#0000000186
(1608926678.237962) vcan0    123#0000000158
(1608926678.251533) vcan0 456#0000000186
(1608926678.237962) vcan0      123#0000000158
(1608926678.251533) vcan0                       456#0000000186
(1608926678.237962) vcan0 123#0000000158

awk -v FPAT='[ \t][0-9]+#' '{
    filter=$1; sub(/[ \t]/, "", filter);
    c[filter]++;
};
END{ for(x in c) print x, c[x]; }' infile
456# 3
123# 4

for a input having multiple matched pattern " [0-9]+#" in each or every lines, you would do:

awk -v FPAT='[ \t][0-9]+#' '{
    for (i=1; i<=NF; i++){ 
        filter=$i; sub(/[ \t]/, "", filter); c[filter]++;
    };
};
END{ for(x in c) print x, c[x]; }' infile
αғsнιη
  • 40,939
  • 15
  • 71
  • 114
2

With any awk in any shell on every Unix box:

$ awk -F'[ #]' '{cnt[$3]++} END{for (val in cnt) print val"#", cnt[val]}' file
123# 1
456# 1
Ed Morton
  • 28,789
  • 5
  • 20
  • 47
0
awk '{for(i=1;i<=NF;i++){if($i ~ /#/){print $i}}}' filename| awk -F "#" '{print $1"#",gsub($1,$0)}'

output

123# 1
456# 1
Praveen Kumar BS
  • 5,139
  • 2
  • 9
  • 14