Deep search of several pdf files with pdfgrep, ignoring counts less than

Question

I am doing a "deep search" within several pdf files with "pdfgrep", trying to find a word and get a count on the documents like this:

# pdfgrep -ric PATTERN

./Example1.pdf:0
./Example2.pdf:10

Any idea how i can ignore the printout for files with an defined number of counts? Like 0 or less than...?

Stéphane Chazelas · Accepted Answer · 2022-05-27T08:07:55.913

2

Assuming file paths don't contain newline characters, you can just pipe that output to:

grep -v ':0$'

To filter out the lines ending in :0.

Or

awk -F: '$NF >= 10'

To only list the files with at least 10 matches.

To handle arbitrary file paths including those with newline characters, use NUL delimiters:

pcregrep -ricZ pattern | gawk -v RS='\0' '
  {RS="\n"; getline count; RS="\0"}
  count > 0 {print $0":"count}'

edited May 27 '22 at 08:07

answered May 27 '22 at 07:54

1 Answers1