14

So I have a line:

ID: 54376

Can you help me make a regex that would only return numbers without "ID:"?

NOTE: This string is in a file.

slm
  • 363,520
  • 117
  • 767
  • 871
Blake Gibbs
  • 241
  • 1
  • 2
  • 3

7 Answers7

20

Try this:

grep -oP '(?<=ID: )[0-9]+' file

or:

perl -nle 'print $1 if /ID:.*?(\d+)/' file
terdon
  • 234,489
  • 66
  • 447
  • 667
cuonglm
  • 150,973
  • 38
  • 327
  • 406
7

There are many ways of doing this. For example:

  1. Use GNU grep with recent PCREs and match the numbers after ID: :

    grep -oP 'ID:\s*\K\d+' file
    
  2. Use awk and simply print the last field of all lines that start with ID:

    awk '/^ID:/{print $NF}' file
    

    That will also print fields that are not numbers though, to get numbers only, and only in the second field, use

    awk '($1=="ID:" && $2~/^[0-9]+$/){print $2}' file
    
  3. Use GNU grep with Extended Regular Expressions and parse it twice:

    grep -Eo '^ID: *[0-9]+' file | grep -o '[0-9]*'
    
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
terdon
  • 234,489
  • 66
  • 447
  • 667
  • Thanks! What `\K` is doing in first example? – rnd_d May 14 '15 at 14:26
  • 2
    @rnd_d it's a Perl Compatible Regular Expressions (PCRE) construct which means "ignore anything matched up to this point". It is used like a lookbehind, it let's me use `-o` to print only the matched portion but also discard things I'm not interested in. Compare `echo "foobar" | grep -oP "foobar"` and `echo "foobar" | grep -oP 'foo\Kbar'` – terdon May 14 '15 at 15:27
4

Use egrep with -o or grep with -Eo option to get only the matched segment. Use [0-9] as regex to get just numbers:

grep -Eo [0-9]+ filename
Rohit Jain
  • 141
  • 4
4
sed -n '/ID: 54376/,${s/[^ 0-9]*//g;/./p}'

That will print only all numbers and spaces occurring after ID: 54376 in any file input.

I've just updated the above a little to make it a little faster with * and not to print blank lines after removing the non-{numeric,space} characters.

It addresses lines from regex /ID: 54376/ ,through the $last and on them s///removes all or any *characters ^not [^ 0-9]* then prints /any/ line with a .character remaining.

DEMO:

{
echo line 
printf 'ID: 54376\nno_nums_or_spaces\n'
printf '%s @nd 0th3r char@cter$ %s\n' $(seq 10)
echo 'ID: 54376'
} | sed -n '/ID 54376/,${s/[^ 0-9]*//g;/./p}'

OUTPUT:

 54376
1  03  2
3  03  4
5  03  6
7  03  8
9  03  10
 54376
mikeserv
  • 57,448
  • 9
  • 113
  • 229
1

Using sed:

{
    echo "ID: 1"
    echo "Line doesn't start with ID: "
    echo "ID: Non-numbers"
    echo "ID: 4"
} | sed -n '/^ID: [0-9][0-9]*$/s/ID: //p'

The -n is "don't print anything by default", the /^ID: [0-9][0-9]*$/ is "for lines matching this regex" (starts with "ID: ", then 1 or more digits, then end of line), and the s/ID: //p is of the form s/pattern/repl/flags - s means we're doing a substitute, to replace the pattern "ID: " with replacement text "" (empty string) using the p flag, which means "print this line after doing the substitution".

Output:

1
4
godlygeek
  • 7,963
  • 1
  • 28
  • 28
0

Another GNU sed command,

sed -nr '/ID: [0-9]+/ s/.*ID: +([0-9]+).*/\1/p' file

It prints any number after ID:

Avinash Raj
  • 3,653
  • 4
  • 20
  • 34
  • You really don't need the `+`. If the difference between one character and 3 characters is your script may not work in all `sed`s you should probably do: `sed -n '/ID: \([0-9][0-9]*\).*/{s//\1/;s/.*[^0-9]//;/./p}'`. Your answer also misses the first `ID: [0-9]` on a line containing two occurrences of `ID: [0-9]`. – mikeserv May 25 '14 at 04:02
0

Use grep + awk :

  grep "^ID" your_file | awk {'print $2'}

Bonus : easy to read :)

lily
  • 1
  • 1
    You don't need `grep` if you're using `awk`. `awk '/^ID/ { print $2 }'` does the same thing, and avoids [grep line-buffering issues](http://unix.stackexchange.com/a/46720/7696). It's also pretty much the same as one of the solutions in @terdon's answer. – cas May 12 '16 at 13:02