1

I am doing a "deep search" within several pdf files with "pdfgrep", trying to find a word and get a count on the documents like this:

# pdfgrep -ric PATTERN

./Example1.pdf:0
./Example2.pdf:10

Any idea how i can ignore the printout for files with an defined number of counts? Like 0 or less than...?

Nils
  • 113
  • 3

1 Answers1

2

Assuming file paths don't contain newline characters, you can just pipe that output to:

grep -v ':0$'

To filter out the lines ending in :0.

Or

awk -F: '$NF >= 10'

To only list the files with at least 10 matches.

To handle arbitrary file paths including those with newline characters, use NUL delimiters:

pcregrep -ricZ pattern | gawk -v RS='\0' '
  {RS="\n"; getline count; RS="\0"}
  count > 0 {print $0":"count}'
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501