Grep fail on multiple matches

Question

I want to grep a search pattern but only succeed (and output the matching line) if there is only one unique match. If two lines match, grep should fail or output nothing.

What have you tried and where are you stuck? – Kamil Maciorowski Jul 15 '22 at 11:37 — Kamil Maciorowski, Jul 15 '22 at 11:37

terdon · Accepted Answer · 2022-07-16T15:49:30.743

7

You can't do this with grep, but you can simply count the matches. I don't know what shell, what grep or what operating system you are using, but here's an example of a bash function that can do that:

maxOne() (
    pattern="$1"
    file="$2"
    IFS=$'\n'
    set -f

    results=( $(grep -m2 -- "$pattern" "$file") )
    if [ "${#results[@]}" -eq 1 ]; then
        printf -- '%s\n' "${results[@]}"
        return 0
    else
        return 1
    fi
)

Add those lines to your ~/.bashrc or just paste them into a terminal with a running bash session, and you can then do:

maxOne foo file

To search for foo in file. Note that the -m option (maximum results) which is used here for efficiency to make grep exit after two matches, isn't supported by all versions of grep so if it gives you an error, just remove it. It isn't needed, it just speed things up.

Important: this will not work for multi-line search strings which you can use with grep -z if your grep supports that. If you need to be able to handle multi-line search patterns, you will need a different approach. Also, this will not work with patterns that match empty lines (e.g. grep '^$' file). Stéphane's solution will handle empty lines, so that would be a better option if this is an issue. His will also work on multiple files, unlike mine, which is a nice perk.

edited Jul 16 '22 at 15:49

answered Jul 15 '22 at 11:51

terdon

234,489
66
447
667

(The question said "only succeed if there is only one unique match", which I took to mean that zero matches should not succeed. Anyway, `printf -- '%s\n' "${results[@]}"` would still print one empty line if the array was empty. Not because the array expansion would conjure up an empty element, but because `printf` prints the format string at least once.) – ilkkachu Jul 15 '22 at 12:35
aaand `set -f` has a different meaning in zsh (but isn't really necessary). Not sure if it's worth making the function usable in both with that issue... – ilkkachu Jul 15 '22 at 12:43
1

@ilkkachu `set -o noglob` works the same in zsh and bash (and is more legible IMO) – Stéphane Chazelas Jul 15 '22 at 12:58
1

Beware `array=( $(grep...) )` would remove empty lines from the output of `grep`, so you can't use that if the pattern may match empty lines. With `bash`, you can use `readarray -t array < <(grep...)` instead which avoids having to mess with `IFS` and `noglob`. See also the `f` parameter expansion flag in `zsh`. – Stéphane Chazelas Jul 15 '22 at 14:07
@ilkkachu fair point about `printf` but the rest of your edits seem to only have made it worse: I want the `function ()` since I don't want this to be run on shells that don't support `function`. As you said, `set -f` doesn't do the same thing in zsh, so why add it? Where do you want to disable globbing? – terdon Jul 15 '22 at 14:30
@StéphaneChazelas I was always thinking that this would not handle patterns with newlines (but I forgot to make that explicit). Is there any reason to mess with IFS if I do _not_ need to handle newlines? – terdon Jul 15 '22 at 14:34
@terdon, well, I expect you'd want to disable word-splitting and globbing when splitting the output of the `$(...)` to the array. Consider a file where the lines have multiple words or consist of e.g. a lone asterisk. `echo hello world > test.txt; maxOne hello test.txt` and it fails since `hello` and `world` produce two elements in the array. Or `echo '*' > test.txt; maxOne . test.txt`, where the glob gets expanded probably giving more than one array element. – ilkkachu Jul 15 '22 at 15:56
@terdon, as for `function maxOne()`, that's not supported in ksh (where `function foo` and `foo()` are both supported but subtly different). The rest of the array stuff required would work in ksh, though (and Bash's arrays are borrowed from ksh anyway). So I'm not sure why you'd want to make _that_ part an arbitrary filter. (It looks to me that arrays are the feature actually needed here, and a shell that doesn't support (ksh) arrays would likely croak at `"${#results[@]}"` or one of the others anyway.) But sure, it's your answer. – ilkkachu Jul 15 '22 at 15:58
@ilkkachu ah! Of course, in the array. Absolutely yes, thanks. I'll add `set -o noglob` as Stéphane suggested. As for `function`, if removing it makes it work in `ksh` as well, then thank you again and I'll do that. I had thought it was the POSIX shells like `sh` and `dash` that would choke on `function` and since I didn't want this to work for them anyway, I saw no point. I learned a few things today, thanks! – terdon Jul 15 '22 at 16:03
@terdon, you need to change `IFS` too, since the default would split on _any_ whitespace, splitting words within a line, not just the lines from each other. And then there's the issue that those changes affect global state, so you'd need to reset `IFS` and the `noglob` flag at the end to avoid messing up other parts of the script... So easiest to wrap the whole function in `( )` instead of `{ }` to run it in a subshell. Or use `local - IFS;` in Bash (the `-` makes `noglob` and other flags local too), but `local` is where ksh is different and I'm not sure it can localize the flags... – ilkkachu Jul 15 '22 at 16:19
@terdon, Um, yeah. I though about writing a longer comment at first, before (or instead of) editing, but, I guess, I thought it was an obvious enough word-splitting issue anyway and wanted to spare everyone from the verbose explanation... Sorry. – ilkkachu Jul 15 '22 at 16:30
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/137810/discussion-between-terdon-and-ilkkachu). – terdon Jul 15 '22 at 16:46
Adding the `-m2` option to `grep` (GNU-specific though) would avoid it looking for occurrences past the second like in my answer to make it more efficient. – Stéphane Chazelas Jul 16 '22 at 11:59
Of course! Thanks, @StéphaneChazelas! – terdon Jul 16 '22 at 15:49

Stéphane Chazelas · Answer 2 · 2022-07-16T10:04:21.550

6

You could do with:

unique_egrep() (
  export ERE="$1"; shift
  exec gawk -e '
    BEGIN               {ret = 1}
    BEGINFILE           {n = 0}
    $0 ~ ENVIRON["ERE"] {if (n++) nextfile; found = $0}
    ENDFILE             {if (n == 1) {print FILENAME":"found; ret = 0}}
    END                 {exit ret}' -E /dev/null "$@"
)

And then unique_egrep pattern *.txt for instance.

Here using the -e 'code' -E /dev/null (in place of 'code') trick to be able to process arbitrary file paths.

All of -e, -E, BEGINFILE, ENDFILE and nextfile are GNU extensions (though nextfile is now found in many other implementations as well).

edited Jul 16 '22 at 10:04

answered Jul 15 '22 at 12:41

Stéphane Chazelas

522,931
91
1,010
1,501

This answer looks very good, but could you briefly explain what is the purpose of `-E /dev/null` in the last line of the `gawk` script? – user000001 Jul 16 '22 at 09:45
2

@user000001, see [Why does awk stop and wait if the filename contains = and how to work around that?](//unix.stackexchange.com/a/490535) – Stéphane Chazelas Jul 16 '22 at 09:47

Grep fail on multiple matches

2 Answers2