Questions tagged [pdfgrep]

12 questions
3
votes
2 answers

Regex search in PDF reader

I am using zathura, as I enjoy its minimalist approach, but I would also switch to mupdf or anything else if this would solve my problem. I need to highlight every word (in PDF and epub documents) one by one from start to finish because I can…
luca
  • 142
  • 1
  • 9
3
votes
1 answer

Is there a tool for searching keywords super fast in many pdfs files?

I have a bunch of technical books, and I have been using pdfgrep for a while, but it takes substantial amount of time for searching all. can somebody recommend me of a cli tool for searching in pdf files super fast? it should have an underline…
JammingThebBits
  • 426
  • 4
  • 13
3
votes
2 answers

How can I get the page numbers only of a pattern in a pdf file, regardless if the pattern is multiline?

I find the page numbers of a multiline pattern in a pdf file, by How shall I grep a multi-line pattern in a pdf file and in a text file? and How can I search a string in a pdf file, and find the physical page number of each page where the string…
Tim
  • 98,580
  • 191
  • 570
  • 977
2
votes
1 answer

Is there any ligature-aware alternative for "pdfgrep" in command line?

I always use "pdfgrep" to search inside of multiple PDF files from the command line. But I met a problem: This ligature character "fi" (see https://www.compart.com/en/unicode/U+FB01).  "fi" is in the word "fixed", so I could not search the term "fixed…
2
votes
2 answers

Is there a way to search (grep/find) a specific word within multiple pdf files located on a specific drive?

I am trying to locate a client's pdf file that was saved on an external backup drive, which contains a little over 8000 pdf files and hundreds of folders. For example, if I want to search all pdf files on drive X: that contains my client's name…
DiFrag
  • 21
  • 1
  • 4
1
vote
1 answer

pdfgrep doesn't work with arabic langauge strings

I want to use pdf grep and it works when I want to search by an Arabic text or string. it shows nothing. however, it works properly when I search by an English string. Does anyone have a solution or even an alternative? Thank you this is the code I…
VANMEN
  • 11
  • 1
1
vote
1 answer

Deep search of several pdf files with pdfgrep, ignoring counts less than

I am doing a "deep search" within several pdf files with "pdfgrep", trying to find a word and get a count on the documents like this: # pdfgrep -ric PATTERN ./Example1.pdf:0 ./Example2.pdf:10 Any idea how i can ignore the printout for files with…
Nils
  • 113
  • 3
1
vote
1 answer

How do I pdfgrep using a specific pattern (Syntax?)

I'm trying to use pdfgrep to search each occurences of a specific pattern (MUST start with E OR S) then followed by 5 digits (Only) THEN execute a command afterward (Which is likely to be a mv command) So far, I have the following command : pdfgrep…
ATragicEnding
  • 11
  • 1
  • 3
1
vote
0 answers

How shall I grep a multi-line pattern in a pdf file and in a text file?

In the output of less my.pdf, a string image not available appears multiple times, for example: ... Lastly, what remains to ^L image not available ^L Implementations and Systems I would like to grep the string in the pdf file, for…
Tim
  • 98,580
  • 191
  • 570
  • 977
0
votes
1 answer

Is it possible to integrate pdfgrep into nemo search?

I often find myself looking for PDF documents. Luckily, I found pdfgrep that really does a great job at finding PDF documents by content. Following command lets me search for documents that have my search word on the first page pdfgrep -irl…
0
votes
0 answers

Split pdf based on keyword

Is there an utility that would split PDF file based on keyword? I can only find split by pages (e.g. QPDF). I can also see pdfgrep, but I don't know whether this has been already combined in some other utility or not. I can write the bash script but…
Tomas Greif
  • 349
  • 1
  • 4
  • 12
-4
votes
2 answers

Can we search in a pdf file for pages containing several words in no particular order?

I would like to search in a pdf file for all the pages, each containing several given words in no particular order. For example, I want to find all the pages which contain both "hello" and "world" in no particular order. I am not sure if pdfgrep …
Tim
  • 98,580
  • 191
  • 570
  • 977