How to run grep with multiple AND patterns?

Question

I would like to get the multi pattern match with implicit AND between patterns, i.e. equivalent to running several greps in a sequence:

grep pattern1 | grep pattern2 | ...

So how to convert it to something like?

grep pattern1 & pattern2 & pattern3

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string. Using filter is system feature, not grep, so it is not an argument for it.

Don't confuse this question with:

grep "pattern1\|pattern2\|..."

This is an OR multi pattern match. I am looking for an AND pattern match.

Similar: [Match all patterns from file at once](http://unix.stackexchange.com/q/332160/21471) — kenorb, Dec 22 '16 at 15:26
Similar question on SO: [Check if multiple strings or regexes exist in a file](https://stackoverflow.com/q/49762772/6862601) — codeforester, Apr 13 '18 at 17:47
If you're looking for the grep syntax for "find lines that contain `foo` and lines that contain `bar`" see [using grep for multiple search patterns](https://stackoverflow.com/questions/13610642/using-grep-for-multiple-search-patterns) — enharmonic, Mar 27 '20 at 17:17

Stéphane Chazelas · Accepted Answer · 2023-07-17T09:25:04.993

132

To find the lines that match each and everyone of a list of patterns, agrep (the original one, now shipped with glimpse, not the unrelated one in the TRE regexp library) can do it with this syntax:

agrep 'pattern1;pattern2'

With GNU grep, when built with PCRE support, you can do:

grep -P '^(?=.*pattern1)(?=.*pattern2)'

With ast grep:

grep -X '.*pattern1.*&.*pattern2.*'

(adding .*s as <x>&<y> matches strings that match both <x> and <y> exactly, a&b would never match as there's no such string that can be both a and b at the same time).

If the patterns don't overlap, you may also be able to do:

grep -e 'pattern1.*pattern2' -e 'pattern2.*pattern1'

The best portable way is probably with awk as already mentioned:

awk '/pattern1/ && /pattern2/'

Or with sed:

sed -e '/pattern1/!d' -e '/pattern2/!d'

Or perl:

perl -ne 'print if /pattern1/ && /pattern2/'

Please beware that all those will have different regular expression syntaxes.

The awk/sed/perl ones don't reflect whether any line matched the patterns in their exit status. To so that you need:

awk '/pattern1/ && /pattern2/ {print; found = 1}
     END {exit !found}'

perl -ne 'if (/pattern1/ && /pattern2/) {print; $found = 1}
          END {exit !$found}'

Or pipe the command to grep '^'.

edited Jul 17 '23 at 09:25

answered Nov 10 '12 at 20:13

Stéphane Chazelas

522,931
91
1,010
1,501

3

The `agrep` syntax is not working for me... which version was it introduced in? – Raman Sep 05 '16 at 22:15
@Raman [2.04 from 1992](ftp://ftp.cs.arizona.edu/agrep/) already had it. I've no reason to believe it wasn't there from the start. Newer (after 1992) versions of `agrep` can be found included with [glimpse/webglimpse](http://webglimpse.net/trial/glimpse-latest.tar.gz). Possibly you have a different implementation. I had a mistake for the ast-grep version though, the option for _augmented regexps_ is `-X`, not `-A`. – Stéphane Chazelas Sep 06 '16 at 05:55
@StéphaneChazelas Thanks, I have `agrep` 0.8.0 on Fedora 23. This appears to be a different `agrep` than the one you reference. – Raman Sep 06 '16 at 06:37
2

@Raman, yours sounds like [TRE `agrep`](https://github.com/laurikari/tre/). – Stéphane Chazelas Sep 06 '16 at 07:01
@StéphaneChazelas Indeed it is. Too bad Fedora doesn't have the `agrep` you are referring to, out of the box. – Raman Sep 13 '16 at 18:15
`awk '/pattern1/ && /pattern2/'` is good, but how do i calculate count for this? – Yogesh D Jun 28 '17 at 17:40
@Techiee, you mean the count of lines that match either pattern or the count of occurrences of each pattern? In any case, it seems it would be a different question. – Stéphane Chazelas Jun 28 '17 at 18:06
@StéphaneChazelas, Right now `awk '/pattern1/ && /pattern2/'` is printing all the lines, i want the print the count of such lines. Can you help? Thanks – Yogesh D Jun 28 '17 at 18:31
@StéphaneChazelas: I got it, `awk '/pattern1/ && /pattern2/' filename | wc -l` will give me desired output. – Yogesh D Jun 28 '17 at 18:35
3

@Techiee, or just `awk '/p1/ && /p2/ {n++}; END {print 0+n}'` – Stéphane Chazelas Jun 28 '17 at 20:23
The `awk` solution doesn't work for me on Cygwin. Looks like it is but checking the process shows that it's doing nothing, no CPU usage at all. – Hashim Aziz Dec 12 '19 at 20:15
@StéphaneChazelas I am getting `grep: invalid matcher .*pattern1.*&.*pattern2.*` – Chaminda Bandara Feb 17 '21 at 02:04
1

@ChamindaBandara, you ran that with GNU `grep` instead of ast `grep`. GNU `grep` has no support for ast augmented regexp. It does have an undocumented `-X` option, but that's for something unrelated, it's to specify the regexp flavour (*matcher*) like in `grep -X perl` being the same as `grep -P`. – Stéphane Chazelas Feb 17 '21 at 09:00
Honestly, I've tried at least half of these suggestions in the example and none of them work as described. – Daniel Kaplan Mar 09 '22 at 20:50
1

@DanielKaplan, from your recent question, I suspect you're looking for something difference from what this Q&A is about. Here we're trying to find *lines* that match all patterns, while you may be trying to find *files* for which all patterns are matched by any line (there are several Q&As here covering that). I've edited the answer to maybe make that more obvious. – Stéphane Chazelas Mar 10 '22 at 07:32
Ah! OK, I see. My mistake. – Daniel Kaplan Mar 10 '22 at 07:36
Note that `awk` exits with a 0 status code even when no matches are found. You can fix this by piping to `grep .` – BallpointBen Jul 14 '23 at 13:34
@BallpointBen, see edit. – Stéphane Chazelas Jul 17 '23 at 09:25

score 29 · Answer 2 · answered Nov 10 '12 at 08:34

29

You didn't specify grep version, this is important. Some regexp engines allow multiple matching groupped by AND using '&' but this is non-standard and non-portable feature. But, at least GNU grep doesn't support this.

OTOH you can simply replace grep with sed, awk, perl, etc. (listed in order of weight increasing). With awk, the command would look like

awk '/regexp1/ && /regexp2/ && /regexp3/ { print; }'

and it can be constructed to be specified in command line in easy way.

answered Nov 10 '12 at 08:34

Netch

2,469
17
11

4

Just remember that `awk` uses ERE's, e.g. the equivalent of `grep -E`, as opposed to the BRE's that plain `grep` uses. – jw013 Nov 10 '12 at 09:42
4

`awk`'s regexes are *called* EREs, but in fact they're a bit idiosyncratic. Here are probably more details than anyone cares for: http://wiki.alpinelinux.org/wiki/Regex – dubiousjim Nov 10 '12 at 15:35
Thank you, grep 2.7.3 (openSUSE). I upvoted you, but I will keep question open for a while, maybe there is some trick for grep (not that I dislike `awk` -- simply knowing more is better). – greenoldman Nov 10 '12 at 15:42
3

The default action is to print the matching line so the `{ print; }` part isn't really necessary or useful here. – tripleee Apr 20 '17 at 11:58
This still returns a `0` status code if the match fails. – BallpointBen Jul 14 '23 at 04:48
@BallpointBen If you mean return code from `awk`, well, it requires explicit generation because awk program doesn't "know" what is positive result for the programmer. You may add a variable used as boolean and to select exit code in END block. – Netch Jul 15 '23 at 05:26
@Netch Actually I think the easiest way is just pipe `awk` to `grep .`. – BallpointBen Jul 15 '23 at 14:34

olejorgenb · Answer 3 · 2020-10-30T23:20:57.997

16

grep pattern1 | grep pattern2 | ...

I would like to use single grep because I am building arguments dynamically, so everything has to fit in one string

It's actually possible to build the pipeline dynamically (without resorting to eval):

# Executes: grep "$1" | grep "$2" | grep "$3" | ...
function chained-grep {
    local pattern="$1"
    if [[ -z "$pattern" ]]; then
        cat
        return
    fi    

    shift
    grep -- "$pattern" | chained-grep "$@"
}

cat something | chained-grep all patterns must match order but matter dont

It's probably not a very efficient solution though.

edited Oct 30 '20 at 23:20

answered Nov 26 '16 at 22:13

olejorgenb

881
11
11

2

Use either `chained-grep()` or `function chained-grep` but not `function chained-grep()`: https://unix.stackexchange.com/questions/73750/difference-between-function-foo-and-foo – nisetama Jan 19 '19 at 17:08
Can you describe what the trick is? Can you add it to the answer (***without*** "Edit:", "Update:", or similar ) by [editing it](https://unix.stackexchange.com/posts/326270/edit)? – Peter Mortensen Oct 30 '20 at 20:40
Reformulated the answer to make the trick clearer (ie.: build a shell pipeline dynamically) – olejorgenb Oct 30 '20 at 23:21
1

The important part here is that shell allows recursion which makes this possible. Note the keyword `local` in front of variable that must be unique for the recursion. Also note that keyword `local` is not POSIX so using shebang `#!/bin/sh` may not be safe, see details here: https://unix.stackexchange.com/a/493743/20336 – Mikko Rantalainen Jul 07 '22 at 07:15

kenorb · Answer 4 · 2019-04-30T18:16:04.087

15

`git grep`

Here is the syntax using git grep combining multiple patterns using Boolean expressions:

git grep --no-index -e pattern1 --and -e pattern2 --and -e pattern3

^{The above command will print lines matching all the patterns at once.}

--no-index Search files in the current directory that is not managed by Git.

Check man git-grep for help.

`ripgrep`

Here is the example using rg:

rg -N '(?P<p1>.*pattern1.*)(?P<p2>.*pattern2.*)(?P<p3>.*pattern3.*)' file.txt

It's one of the quickest grepping tools, since it's built on top of Rust's regex engine which uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.

^{See also related feature request at GH-875.}

edited Apr 11 '18 at 09:18

answered Apr 11 '18 at 01:19

kenorb

20,250
14
140
164

Noam Manos · Answer 9 · 2020-11-01T15:41:35.620

-2

To find all of the words (or patterns), you can run grep in a for loop. The main advantage here is searching from a list of regular expressions.

A real example:

# File 'search_all_regex_and_error_if_missing.sh'

find_list="\
^a+$ \
^b+$ \
^h+$ \
^d+$ \
"

for item in $find_list; do
   if grep -E "$item" file_to_search_within.txt
   then
       echo "$item found in file."
   else
       echo "Error: $item not found in file. Exiting!"
       exit 1
   fi
done

Now let's run it on this file:

hhhhhhhhhh
aaaaaaa
bbbbbbbbb
ababbabaabbaaa
ccccccc
dsfsdf
bbbb
cccdd
aa
caa

$ ./search_all_regex_and_error_if_missing.sh
aaaaaaa aa
^a+$ found in file.
bbbbbbbbb bbbb
^b+$ found in file.
hhhhhhhhhh
^h+$ found in file.
Error: ^d+$ not found in file. Exiting!

edited Nov 01 '20 at 15:41

answered Aug 14 '18 at 06:51

Noam Manos

961
10
11

2

Your logic is faulty -- I asked for `ALL` operator, your code works as `OR` operator, not `AND`. And btw. for that (`OR`) is much easier solution given right in the question. – greenoldman Aug 14 '18 at 22:18
@greenoldman The logic is simple: The for will **loop on ALL of the words/patterns** in the list, and if it is found in file - will print it. So just remove the else if you don't need action in case word was not found. – Noam Manos Aug 16 '18 at 15:07
1

I understand your logic as well as my question -- I was asking about `AND` operator, meaning the file is only a positive hit if it matches pattern A and pattern B and pattern C and... `AND` In you case file is positive hit if it matches pattern A or pattern B or... Do you see the difference now? – greenoldman Aug 17 '18 at 06:19
@greenoldman not sure why you think this loop does not check AND condition for all patterns? So I've edited my answer with a real example: It will search in file for all regex of list, and on the first one which is missing - will exit with error. – Noam Manos Aug 19 '18 at 15:04
You have it right in front of your eyes, you have positive match just after first match is executed. You should have "collect" all outcomes and compute `AND` on them. Then you should rewrite the script to run on multiple files -- then maybe you realize that the question is already answered and your attempt does not bring anything to the table, sorry. – greenoldman Aug 20 '18 at 05:56
@greenoldman sorry I don't get your point. `^a+$` , `^b+$` , `^h+$` are all positive match, but `^d+$` is not a match, so the search then breaks. That's exactly the meaning of **AND condition**! You don't have to print anything during the loop, just after it ends, if that's what you want. – Noam Manos Aug 20 '18 at 09:51
Of course, I could add variable tracking whether `AND` condition is fulfilled, and then I would have an **extra** script instead of short and concise call of grep which was posted and accepted as solution **six** years ago. Take signal to noise into consideration and please delete your entire answer -- it does not add anything really. – greenoldman Aug 20 '18 at 16:38
@greenholdman, why do you think it doesn't add anything? It's a great solution to verify numerous words/regexs. Imagine grep -e 'pattern1.*pattern2' -e 'pattern2.*pattern1' on 10+ regex, not just two... – Noam Manos Aug 23 '18 at 07:58
Note that this answer is about searching for all patterns and reporting if each pattern cannot be find at least once in the file. The original question was about matching ALL the patterns against ALL the lines instead of matching files. – Mikko Rantalainen Jul 07 '22 at 07:27

How to run grep with multiple AND patterns?

9 Answers9

`git grep`

`ripgrep`

Linked

Related