1

In the output of less my.pdf, a string image not available appears multiple times, for example:

... Lastly, what remains to
^L image
   not
available
^L     Implementations and Systems      

I would like to grep the string in the pdf file, for example, by pdfgrep. But pdfgrep -ni "image not available" my.pdf doesn't find anything. What shall I do then?

Related question: Since pdfgrep has similar user interface to grep, how does grep handle a pattern which spans more than one lines?

Thanks.

Tim
  • 98,580
  • 191
  • 570
  • 977
  • 1
    AFAIK `grep` doesn't - unless your `grep` implementation provides a null terminated input option (`-z`) allowing you to fake it by slurping the whole file. OTOH `pcregrep` does, using the `-M` switch. I don't have a suitable PDF to test it on, but I'd try something like `pdfgrep -Pn '(?s)image\s+?not\s+?available` – steeldriver Jul 22 '18 at 22:27
  • 1
    Thanks. `pdfgrep -Pn '(?s)image\s+?not\s+?available' my.pdf` works. – Tim Jul 22 '18 at 22:30
  • Thanks. https://unix.stackexchange.com/questions/457844/how-can-i-get-the-page-numbers-only-of-a-pattern-in-a-pdf-file-regardless-if-th – Tim Jul 22 '18 at 23:28
  • 1
    ... FWIW you probably don't *need* to use PCRE, you just need a regex that matches the newlines - either implicitly (like PCRE with the `(?s)` modifier) or explicitly (using `\n` where required) – steeldriver Jul 23 '18 at 01:33

0 Answers0