6

Here is the beginning of a file:

# hexdump -n 550 myFile
0000000 f0f2 f5f0 f7f9 f1f1 f1f0 f0f0 e3f1 f3c8
0000010 f3f5 0000 0000 000c 0000 0000 0000 000c
0000020 0000 0c00 0000 0000 0000 0c00 0000 0000
0000030 000c 0000 0000 0000 000c 0000 0c00 0000
0000040 0000 0000 0c00 0000 0000 000c 0000 0000
0000050 0000 000c 0000 0c00 0000 0000 0000 0c00
0000060 0000 0000 000c 0000 0000 0000 000c 0000
*
00000b0 0000 0000 000c 0000 0000 0000 0000 0000
00000c0 0000 0000 0000 0c00 0000 0000 0000 0c00
00000d0 0000 0000 000c 0000 0000 0000 000c 0000
00000e0 0c00 0000 0000 0000 0c00 0000 0000 000c
00000f0 0000 0000 0000 000c 0000 0c00 0000 0000
0000100 0000 0c00 0000 0000 000c 0000 0000 0000
0000110 000c 0000 0c00 0000 0000 0000 0c00 0000
0000120 0000 0000 0c00 0000 0000 0000 0c00 0000
*
0000160 0000 0000 0c00 0000 0000 0000 0000 0000
0000170 0000 0000 0000 0000 000c 0000 0000 0000
0000180 000c 0000 0c00 0000 0000 0000 0c00 0000
0000190 0000 000c 0000 0000 0000 000c 0000 0c00
00001a0 0000 0000 0000 0c00 0000 0000 000c 0000
00001b0 0000 0000 000c 0000 0c00 0000 0000 0000
00001c0 0c00 0000 0000 000c 0000 0000 0000 000c
00001d0 0000 0000 0000 000c 0000 0000 0000 000c
*
0000210 0000 0000 0000 000c 0000 0000 0000 0000
0000220 0000 0000 0a00
0000226

in which we can see the hex values 0c and 0a

I don't understand why grep finds 0c but not 0a:

# grep -P '\x0c' myFile
Fichier binaire myFile correspondant
# grep -P '\x0a' myFile
<nothing in the output>

I am using CentOS.

terdon
  • 234,489
  • 66
  • 447
  • 667
mcoulont
  • 71
  • 3
  • What `grep` is this? Are you on Linux? Is this GNU `grep`? Can you give us an example of the file itself so we can try to reproduce it? – terdon Aug 18 '21 at 15:48
  • 3
    I *suspect* you need `-zP` otherwise the newlines are stripped out before the pattern matching takes place – steeldriver Aug 18 '21 at 15:51
  • 1
    @steeldriver I think you're absolutely right, you may as well post an answer. Test with `grep -zP '\x0a' /bin/ln` vs `grep -P '\x0a' /bin/ln`. – terdon Aug 18 '21 at 15:53
  • I'm on CentOS. The file is binary. You can see its 550 first characters in the dump. The same problem remains with a file containing only its 16 last bytes (hexdump: 000c 0000 0000 0000 0000 0000 0000 0a00). – mcoulont Aug 18 '21 at 15:54
  • 1
    Can you confirm that it works as expected if you use `grep -zP '\x0a' `? – terdon Aug 18 '21 at 15:58
  • As output to the screen is just ASCII have you try just `grep "0a"`? – Romeo Ninov Aug 18 '21 at 16:01
  • Thanks @steeldriver : the output is now kind of black but now, at least, I can distinguish when the byte is found (one or several black lines returned) from when it's not (nothing returned). If I could I would vote up. – mcoulont Aug 18 '21 at 16:01
  • `grep "0a"` ? But my file is binary. – mcoulont Aug 18 '21 at 16:03
  • @mcoulont, you show in question `hexdump` of the file, why you do not use it? – Romeo Ninov Aug 18 '21 at 16:04
  • @RomeoNinov : I want to scan many files. And working on a (computed) hexdump is not a marvel from a performance perspective – mcoulont Aug 18 '21 at 18:32
  • @mcoulont, just compare the times, you may be surprised :) – Romeo Ninov Aug 18 '21 at 18:57
  • @mcoulont Both hexdump and grep are line oriented. Therefore piping the output of hexdump to grep will use 2 CPU cores resulting in usually exactly the same processing time as just running grep. In a world where multi core CPUs are common you might as well use them – slebetman Aug 20 '21 at 08:08

1 Answers1

23

\x0a isn't just any hex value - it's the hex value corresponding to the ASCII linefeed character.

Since grep is (by default) line-based, the linefeed characters are stripped out before pattern matching takes place. At least with GNU grep, you can change this behavior with the -z option:

   -z, --null-data
          Treat  input  and  output  data  as  sequences  of  lines,  each
          terminated by a zero byte (the ASCII NUL character) instead of a
          newline. 

however note that this will strip out ASCII nulls, so that you will no longer be able to grep for those.

steeldriver
  • 78,509
  • 12
  • 109
  • 152
  • 1
    I suppose you can still at least count the NULLs with something like `grep -zP '.' file | wc -l`. – terdon Aug 18 '21 at 16:18