13

I need to merge a few dozed pdfs, and i want all of the input pdfs to start on an odd page in the output pdf.

Example: A.pdf has 3 pages, B.pdf has 4 pages. I don't want my output to have 7 pages. What I want is an 8-page pdf in which pages 1-3 are from A.pdf, page 4 is empty, and pages 5-8 are from B.pdf. How can I do this?

I know about pdftk, but I didn't find such an option in the man page.

terdon
  • 234,489
  • 66
  • 447
  • 667
Jan Warchoł
  • 2,881
  • 3
  • 16
  • 28

5 Answers5

7

The PyPdf library makes this sort of things easy if you're willing to write a bit of Python. Save the code below in a script called pdf-cat-even (or whatever you like), make it executable (chmod +x pdf-cat-even), and run it as a filter (./pdf-cat-even a.pdf b.pdf >concatenated.pdf). You need pyPdf ≥1.13 for the addBlankPage method.

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
output_page_number = 0
alignment = 2           # to align on even pages
for filename in sys.argv[1:]:
    # This code is executed for every file in turn
    input = PdfFileReader(open(filename))
    for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
        # This code is executed for every input page in turn
        output.addPage(p)
        output_page_number += 1
    while output_page_number % alignment != 0:
        output.addBlankPage()
        output_page_number += 1
output.write(sys.stdout)
muru
  • 69,900
  • 13
  • 192
  • 292
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
3

The first step is to produce a pdf file with an empty page. You can do this easily with a lot of programs (LibreOffice/OpenOffice, inkscape, (La)TeX, scribus, etc.)

Then just include this empty page where needed:

pdftk A.pdf empty_page.pdf B.pdf output result.pdf 

If you want to do this automatically with a script, you can use e.g. pdftk file.pdf dump_data | grep NumberOfPages | egrep -o '[0-9]*' to extract the page count.

jofel
  • 26,513
  • 6
  • 65
  • 92
  • This feels like a bit of a hack. Though if it works, it works I suppose. – Sam Whited Feb 28 '13 at 16:18
  • This approach almost worked for me: i wrote a script that produced a list of pdfs with epmtyPage.pdf added where necessary, but i couldn't get pdftk to correctly parse this list if the filenames contained spaces. I've tried changing IFS value, using quotation marks but to no avail - maybe it's pdftk's fault. Anyway, [the answer using pypdf](http://unix.stackexchange.com/a/66455/32950) worked for me. – Jan Warchoł Mar 01 '13 at 12:18
  • @JanekWarchol Which version of pdftk did you use? At least pdftk 1.44 and newer seems to support whitespaces in filenames. – jofel Mar 09 '13 at 01:11
  • @jofel `pdftk --version` returns pdftk 1.44. I remember that my more-bash-savvy friends spent at least 15 minutes trying different things to get this work and gave up. – Jan Warchoł Mar 09 '13 at 08:44
1

You could also use LaTeX to do this (though I'm aware it's probably not what you want). Something like the following should work:

\documentclass{book}

\usepackage{pdfpages}

\begin{document}

\includepdf[pages=-]{A}
\cleardoublepage % Make sure we clear to an odd page
\includepdf[pages=-]{B} % This inserts all pages. Or you can specify specific pages, a range, or `{}` for a blank page

\end{document}

Note that \cleardoublepage only inserts a blank page with classes that are made for two sided printing (eg. book)

More options and info on pdfpages can be found on CTAN.

Sam Whited
  • 433
  • 3
  • 7
  • 2
    To include all pages automatically, you can use `\includepdf[pages=-]{...}`. – jofel Feb 28 '13 at 16:41
  • @jofel Thanks, fixed the question. I think it defaults to all pages too, I just put it in there to show that it was possible to select certain pages. – Sam Whited Feb 28 '13 at 17:51
  • @jofel Also, `\cleardoublepage` only inserts a blank page if you're using a class made for two sided printing. I was using article which doesn't work; I fixed it and updated the question to reflect that. – Sam Whited Feb 28 '13 at 17:56
  • `\includepdf` includes only the first page by default (not all pages). `\documentclass[twoside]{article}` works also. – jofel Mar 01 '13 at 00:57
  • From what i see i'd have to explicitely write all files that have to be included, so that's not good enough for me. But thanks anyway. – Jan Warchoł Mar 01 '13 at 12:19
  • Ah, I see, I was under the impression that you were doing that anyways (listing them all in command line args). While you could automate this with LaTeX easily enough, the python example is a better way of doing it anyhow, so I'll leave this as is. – Sam Whited Mar 01 '13 at 19:00
1

Gilles' answer worked for me, but since i have to merge many files it's more convenient if i can read their names from a text file. I've slightly modified Gilles' code to do just that, maybe it would help someone else:

#!/usr/bin/env python

# requires PyPdf library, version 1.13 or above -
# its homepage is http://pybrary.net/pyPdf/
# running: ./this-script-name file-with-pdf-list > output.pdf

import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
output_page_number = 0

# every new file should start on (n*alignment + 1)th page
# (with value 2 this means starting always on an odd page)
alignment = 2

listoffiles = open(sys.argv[1]).read().splitlines()
for filename in listoffiles:
    # This code is executed for every file in turn
    input = PdfFileReader(open(filename))
    for p in [input.getPage(i) for i in range(0,input.getNumPages())]:
        # This code is executed for every input page in turn
        output.addPage(p)
        output_page_number += 1
    while output_page_number % alignment != 0:
        output.addBlankPage()
        output_page_number += 1
output.write(sys.stdout)
Jan Warchoł
  • 2,881
  • 3
  • 16
  • 28
0

Here's the code with PyPDF2 and python3

#!/usr/bin/env python


# requires PyPdf2 library, version 1.26 or above -
# its homepage is https://pythonhosted.org/PyPDF2/index.html
# running: ./this-script-name output.pdf file-with-pdf-list

import copy, sys
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
output_page_number = 0

# every new file should start on (n*alignment + 1)th page
# (with value 2 this means starting always on an odd page)
alignment = 2

for filename in sys.argv[2:]:
    # This code is executed for every file in turn
    input = PdfFileReader(open(filename, "rb"))
    output.appendPagesFromReader(input)
    output_page_number += input.getNumPages()

    while output_page_number % alignment != 0:
        output.addBlankPage()
        output_page_number += 1

output.write(open(sys.argv[1], "wb"))
Loren
  • 171
  • 7