28

I know I have done this before, so I'm sure it's possible, I just forget how to do it. There's a way to tell convert to grab a specific page of a PDF, and I'd like to keep the format of that page as PDF.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
ixtmixilix
  • 13,040
  • 27
  • 82
  • 118

3 Answers3

33

You can use subscript notation with convert(1) to "index" into a PDF:

$ convert source.pdf[1] dest.pdf 

The index value depends on how the PDF exporter numbered the pages. In tests on files here, the numbers seem to be zero-based, so the above example gets you the second page in the document. I've seen examples online where they show letter indexes instead, since apparently the PDF creator "numbered" the pages in that document that way instead.

Unfortunately, this doesn't give very good results, because ImageMagick assumes everything is pixel-based, and therefore rasterizes vector imagery, such as the typography in a typical PDF.

A better tool for the job is Ghostscript, which you probably already have installed:

$ gs -dNOPAUSE -dBATCH -dFirstPage=2 -dLastPage=2 -sDEVICE=pdfwrite \
    -sOutputFile=dest.pdf -f src.pdf

This passes the PDF data through unchanged, since Ghostscript understands PDF (a PostScript derivative) to a much deeper level than ImageMagick does.

Warren Young
  • 71,107
  • 16
  • 178
  • 168
  • 2
    actually that's not true about imagemagick, if you set the -density parameter to something around 300-400 then the outputted text from the pdf in the png will look just fine. – buggedcom Aug 22 '12 at 23:19
  • 5
    It'll look fine on screen, sure, but if you then go to print, you'll want to set the density even higher. And then, you're likely to run into trouble with how your printer's RIP copes with the gray antialiasing pixels output by ImageMagick. So you can then choose instead to output to 1-bit B&W at your printer's native resolution, which might be 1,200 dpi, or 1,440 dpi or something else, and you have to know that in advance to get sharp output. No, I'll stand by my statement: best to keep PDF data in vector form as long as possible. – Warren Young Aug 23 '12 at 02:21
  • @buggedcom I've found `-density 300` is the sweet spot. Anything larger and you're creating huge temp files - which you're probably going to resize down to thumbnails anyway – Mike Causer Dec 16 '13 at 03:26
  • 2
    You can also select a range of pages (e.g. for making a gif) like so `source.pdf[3-6]` – texasflood May 19 '16 at 20:09
  • With `convert` you can also select a list of pages: `source.pdf[1,3,5]` – Westy92 Aug 12 '21 at 06:24
  • `convert` extracts the pages but the resulting pdf files are blurry. If you set `density 300` then the resulting pdf files are huge. As @WarrenYoung pointed out it is best to use Ghostscript. Very fast and resulting file is as good as the original one. – tbaskan Dec 06 '22 at 06:59
27

ImageMagick is a tool for bitmap images, which most PDFs aren't. If you use it, it will rasterize the data, which is often not desirable.

Pdftk can extract one or more pages from a PDF file.

pdftk A=input.pdf cat A42 A43 output pages_42_43.pdf

If you have a LaTeX installation with PDFLaTeX, you can use pdfpages. There's a shell wrapper for pdfpages, pdfjam.

pdfjam -o pages_42_43.pdf input.pdf 42,43

Another possibility (overkill here, but useful for requirements more complex that one page) is Python with the PyPdf library.

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for i in [42, 43]:
    output.addPage(input.getPage(i))
output.write(sys.stdout)
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
  • I was about to recommend `pdftk` as well. You will want to use it. – Sebastian Jun 10 '11 at 08:12
  • `pdfjam` works like a charm, and was already installed with my LaTeX distribution. It is very easy to use. – hdl Sep 09 '16 at 14:12
  • Thanks a lot. The extracted page was larger than the complete pdf with `pdftk` so it doesn't seem to simply extract a page. The result was fine otherwise. – Eric Duminil Jul 05 '18 at 10:11
3

This Q&A is from 2011. As of 2021, I think the most stable and well-maintained option for this purpose is qpdf:

qpdf input.pdf --pages . 12 -- output.pdf

Page numbering seems to start from 1, but I haven't checked how this works when the pdf file has page numbering metadata.

I did this using pdftk for many years, but pdftk is poorly engineered and depends on an obsolete version of a library.