0

Is there an utility that would split PDF file based on keyword? I can only find split by pages (e.g. QPDF). I can also see pdfgrep, but I don't know whether this has been already combined in some other utility or not. I can write the bash script but how do I return the pages to split by from pdfgrep?

Tomas Greif
  • 349
  • 1
  • 4
  • 12
  • Do you want to extract pages, that contain a given keyword to a new pdf? Or do you have a list of keywords and want to create a new pdf for each keyword with pages containing the cḱeyword? – finswimmer Feb 27 '19 at 05:19
  • Also note that the PDF format is a description of which graphical elements (glyph, lines, ...) to put where on a given page, in whatever order the producing application output this description. Mapping this description back to text sort of works (but not always), and "splitting" a page (whatever it is supposed to mean) isn't easy. – dirkt Feb 27 '19 at 06:32
  • @dirkt My pdf works well with `pdftotext` and `pdfgrep` so I hope this is possible. – Tomas Greif Feb 28 '19 at 04:22
  • @finswimmer I want to split the pdf. Let's say there are 5 pages and the keyword is on pages 2 and 4. I would like to have pdfs:1,2-3,4-5. – Tomas Greif Feb 28 '19 at 04:23
  • So you need three steps: Find the pages where the keywords are (`pdfgrep -n`), build page ranges according to this (a bit of shell scripting), split the PDF (any of a number of PDF utilities). Should be doable. – dirkt Feb 28 '19 at 06:16

0 Answers0