How shall I grep a multi-line pattern in a pdf file and in a text file?

Asked Jul 22 '18 at 21:58

Active Jul 22 '18 at 21:58

Viewed 152 times

In the output of less my.pdf, a string image not available appears multiple times, for example:

... Lastly, what remains to
^L image
   not
available
^L     Implementations and Systems

I would like to grep the string in the pdf file, for example, by pdfgrep. But pdfgrep -ni "image not available" my.pdf doesn't find anything. What shall I do then?

Related question: Since pdfgrep has similar user interface to grep, how does grep handle a pattern which spans more than one lines?

Thanks.

asked Jul 22 '18 at 21:58

Tim

98,580
191
570
977

1

AFAIK `grep` doesn't - unless your `grep` implementation provides a null terminated input option (`-z`) allowing you to fake it by slurping the whole file. OTOH `pcregrep` does, using the `-M` switch. I don't have a suitable PDF to test it on, but I'd try something like `pdfgrep -Pn '(?s)image\s+?not\s+?available` – steeldriver Jul 22 '18 at 22:27
1

Thanks. `pdfgrep -Pn '(?s)image\s+?not\s+?available' my.pdf` works. – Tim Jul 22 '18 at 22:30
Thanks. https://unix.stackexchange.com/questions/457844/how-can-i-get-the-page-numbers-only-of-a-pattern-in-a-pdf-file-regardless-if-th – Tim Jul 22 '18 at 23:28
1

... FWIW you probably don't *need* to use PCRE, you just need a regex that matches the newlines - either implicitly (like PCRE with the `(?s)` modifier) or explicitly (using `\n` where required) – steeldriver Jul 23 '18 at 01:33

How shall I grep a multi-line pattern in a pdf file and in a text file?

0 Answers0

Linked