I find the page numbers of a multiline pattern in a pdf file, by How shall I grep a multi-line pattern in a pdf file and in a text file? and How can I search a string in a pdf file, and find the physical page number of each page where the string appears?
$ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf
49: image
not
available
51: image
not
available
53: image
not
available
54: image
not
available
55: image
not
available
I would like to extract the page number only, but because the pattern is multiline, I get
$ pdfgrep -Pn '(?s)image\s+?not\s+?available' main_text.pdf | awk -F":" '{print $1}'
49
not
available
51
not
available
53
not
available
54
not
available
55
not
available
instead of
49
51
53
54
55
I wonder how I can extract the page numbers only, regardless if the pattern is multiline? Thanks.