557

Say I have a file:

# file: 'test.txt'
foobar bash 1
bash
foobar happy
foobar

I only want to know what words appear after "foobar", so I can use this regex:

"foobar \(\w\+\)"

The parenthesis indicate that I have a special interest in the word right after foobar. But when I do a grep "foobar \(\w\+\)" test.txt, I get the entire lines that match the entire regex, rather than just "the word after foobar":

foobar bash 1
foobar happy

I would much prefer that the output of that command looked like this:

bash
happy

Is there a way to tell grep to only output the items that match the grouping (or a specific grouping) in a regular expression?

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
Cory Klein
  • 18,391
  • 26
  • 81
  • 93

8 Answers8

616

GNU grep has the -P option for perl-style regexes, and the -o option to print only what matches the pattern. These can be combined using look-around assertions (described under Extended Patterns in the perlre manpage) to remove part of the grep pattern from what is determined to have matched for the purposes of -o.

$ grep -oP 'foobar \K\w+' test.txt
bash
happy
$

The \K is the short-form (and more efficient form) of (?<=pattern) which you use as a zero-width look-behind assertion before the text you want to output. (?=pattern) can be used as a zero-width look-ahead assertion after the text you want to output.

For instance, if you wanted to match the word between foo and bar, you could use:

$ grep -oP 'foo \K\w+(?= bar)' test.txt

or (for symmetry)

$ grep -oP '(?<=foo )\w+(?= bar)' test.txt
camh
  • 38,261
  • 8
  • 74
  • 62
  • 12
    How you do it if your regex has more than a grouping? (as the title implied?) – barracel Mar 21 '13 at 07:52
  • 10
    @barracel: I don't believe you can. Time for `sed(1)` – camh Mar 22 '13 at 22:51
  • 3
    @camh I have just tested that `grep -oP 'foobar \K\w+' test.txt` outputs nothing with the OP's `test.txt`. The grep version is 2.5.1. What could be wrong ? O_O – SOUser Jul 24 '14 at 14:19
  • @XichenLi: I can't say. I just built v2.5.1 of grep (it's pretty old - from 2006) and it worked for me. – camh Jul 25 '14 at 10:18
  • @SOUser: I experienced the same - outputs nothing to file. I submitted the edit request to include '>' before the filename to send output as this worked for me. – rjchicago Dec 15 '16 at 21:40
  • 8
    Great answer for mentioning the `\K`! When I used `(?<=)` grep complained about my look-behind not being of fixed length, but using `\K` solved the problem. – Hai Zhang Mar 31 '17 at 11:30
  • 3
    seems -P flag doesn't work on Mac El Capitan at least – OZZIE Jan 25 '18 at 10:39
  • @OZZIE Does Mac El Capitan have GNU grep? I guess not (at least by default - perhaps you can install it with homebrew?) – camh Jan 26 '18 at 10:32
  • This answer is OP--as in Over Powered :-D – Jonathan Benn Aug 13 '20 at 15:47
  • @OZZIE Brew intall `ggrep` – JP Zhang Jun 08 '21 at 11:27
145
    sed -n "s/^.*foobar\s*\(\S*\).*$/\1/p"

-n     suppress printing
s      substitute
^.*    anything before foobar
foobar initial search match
\s*    any white space character (space)
\(     start capture group
\S*    capture any non-white space character (word)
\)     end capture group
.*$    anything after the capture group
\1     substitute everything with the 1st capture group
p      print it
jgshawkey
  • 1,569
  • 1
  • 9
  • 3
  • 8
    +1 for the sed example, seems like a better tool for the job than grep. One comment, the `^` and `$` are extraneous since `.*` is a greedy match. However, including them might help clarify the intent of the regex. – Tony May 30 '18 at 21:22
  • And for me was escential to add `.*` at the beginning. Otherwise it also captured what's before to foobar. – aerijman Feb 19 '20 at 18:37
  • 3
    For some reason this does not seem to work with macOS sed: `echo "foobar bash 1" | sed -n "s/^.*foobar\s*\(\S*\).*$/\1/p"` outputs nothing. – Frederik Nov 27 '20 at 15:57
  • How do you do that when the search contains parenthesis? – Tofandel Apr 29 '21 at 15:54
  • 3
    I had to add "-r" as sed option in order for it to work. – Roemer Jun 08 '21 at 12:37
  • @Frederik macOS is based off of BSD so many of the utilities are the BSD versions instead of the GNU ones. – Kenny Evitt Dec 15 '21 at 00:11
  • 3
    with `sed -nr` and `( )` instead of `\( \)` it worked for me (Ubuntu 20.4) – Martin T. Feb 07 '22 at 09:13
71

Standard grep can't do this, but recent versions of GNU grep can. You can turn to sed, awk or perl. Here are a few examples that do what you want on your sample input; they behave slightly differently in corner cases.

Replace foobar word other stuff by word, print only if a replacement is done.

sed -n -e 's/^foobar \([[:alnum:]]\+\).*/\1/p'

If the first word is foobar, print the second word.

awk '$1 == "foobar" {print $2}'

Strip foobar if it's the first word, and skip the line otherwise; then strip everything after the first whitespace and print.

perl -lne 's/^foobar\s+// or next; s/\s.*//; print'
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • Awesome! I thought I may be able to do this with sed, but I haven't used it before and was hoping I could use my familiar `grep`. But the syntax for these commands actually looks very familiar now that I am familiar with vim-style search & replace + regexes. Thanks a ton. – Cory Klein May 19 '11 at 23:51
  • 1
    Not true, Gilles. See my answer for a GNU grep solution. – camh May 20 '11 at 01:33
  • 2
    @camh: Ah, I didn't know GNU grep now had full PCRE support. I've corrected my answer, thanks. – Gilles 'SO- stop being evil' May 20 '11 at 07:14
  • 3
    This answer is especially useful for embedded Linux since Busybox `grep` doesn't have PCRE support. – Craig McQueen Mar 17 '16 at 00:12
  • Obviously there are multiple ways to accomplish the same task presented, however, if the OP ask for grep usage, why you answer something else? Also, your first paragraph is incorrect: yes grep can do it. – fcm Mar 11 '19 at 13:31
48

pcregrep has a smarter -o option that lets you choose which capturing groups you want output.  So, using your example file,

$ pcregrep -o1 "foobar (\w+)" test.txt
bash
happy
  • 4
    Wow, this was magical for me, thank you so much. I'm on MacOS, and was trying to use match-groups somehow. I had been trying `zegrep` because I was grepping a large zip-file, but also found that pcregrep will (from the`pcregrep --help` page): `Files whose names end in .gz are read using zlib.` So I could use it straight away on my zip file. Thanks again! – samjewell Apr 06 '20 at 15:11
  • This worked perfect for me, Thanks. – Bishal Paudel Jul 19 '22 at 23:17
33

Well, if you know that foobar is always the first word or the line, then you can use cut. Like so:

grep "foobar" test.file | cut -d" " -f2
Dave
  • 431
  • 3
  • 2
24

Using grep is not cross-platform compatible, since -P/--perl-regexp is only available on GNU grep, not BSD grep.

Here is the solution using ripgrep:

$ rg -o "foobar (\w+)" -r '$1' <test.txt
bash
happy

As per man rg:

-r/--replace REPLACEMENT_TEXT Replace every match with the text given.

Capture group indices (e.g., $5) and names (e.g., $foo) are supported in the replacement string.

Related: GH-462.

kenorb
  • 20,250
  • 14
  • 140
  • 164
10

If PCRE is not supported you can achieve the same result with two invocations of grep. For example to grab the word after foobar do this:

<test.txt grep -o 'foobar  *[^ ]*' | grep -o '[^ ]*$'

This can be expanded to an arbitrary word after foobar like this (with EREs for readability):

i=1
<test.txt egrep -o 'foobar +([^ ]+ +){'$i'}[^ ]+' | grep -o '[^ ]*$'

Output:

1

Note the index i is zero-based.

Thor
  • 16,942
  • 3
  • 52
  • 69
3

I found the answer of @jgshawkey very helpful. grep is not such a good tool for this, but sed is, although here we have an example that uses grep to grab a relevant line.

Regex syntax of sed is idiosyncratic if you are not used to it.

Here is another example: this one parses output of xinput to get an ID integer

⎜   ↳ SynPS/2 Synaptics TouchPad                id=19   [slave  pointer  (2)]

and I want 19

export TouchPadID=$(xinput | grep 'TouchPad' | sed  -n "s/^.*id=\([[:digit:]]\+\).*$/\1/p")

Note the class syntax:

[[:digit:]]

and the need to escape the following +

I assume only one line matches.

Tim Richardson
  • 230
  • 1
  • 7
  • This is exactly what I was trying to do. Thanks! – James May 12 '19 at 00:07
  • Slightly simpler version without the extra `grep`, assuming 'TouchPad' is to the left of 'id' : `echo "SynPS/2 Synaptics TouchPad id=19 [slave pointer (2)]" | sed -nE "s/.*TouchPad.+id=([0-9]+).*/\1/p"` – Amit Naidu May 19 '19 at 05:10