19

I have several pdf files (chapter1.pdf, chapter2.pdf, etc.), each one being a chapter of a book. I now how to merge them into a single pdf (I use the command pdfunite from poppler), but since the output file is big, it's difficult to find a chapter without having them indexed in a table of contents. So how to create an embedded table of contents in which each merged chapter is an entry?

Note that I do not want to create a page in the output file which contains the list of chapters and their respective page numbers. I want the index/table of contents metadata of an pdf file, that can be browseable in any pdf reader's (or ebook device's) which supports such feature.

Seninha
  • 1,035
  • 1
  • 9
  • 17
  • 1
    You can use pdftk as you can read [here](https://unix.stackexchange.com/questions/17065/add-and-edit-bookmarks-to-pdf). – Hölderlin May 31 '17 at 23:54

2 Answers2

16

Non-destructive version of @bu5hman's answer:

#!/bin/bash

out_file="combined.pdf"
bookmarks_file="/tmp/bookmarks.txt"
bookmarks_fmt="BookmarkBegin
BookmarkTitle: %s
BookmarkLevel: 1
BookmarkPageNumber: %d
"

rm -f "$bookmarks_file" "$out_file"

declare -a files=(*.pdf)
page_counter=1

# Generate bookmarks file.
for f in "${files[@]}"; do
    title="${f%.*}"
    printf "$bookmarks_fmt" "$title" "$page_counter" >> "$bookmarks_file"
    num_pages="$(pdftk "$f" dump_data | grep NumberOfPages | awk '{print $2}')"
    page_counter=$((page_counter + num_pages))
done

# Combine PDFs and embed the generated bookmarks file.
pdftk "${files[@]}" cat output - | \
    pdftk - update_info "$bookmarks_file" output "$out_file"

It works by:

  1. Generating bookmarks.txt.
  2. Merging PDFs into combined.pdf.
  3. Updating combined.pdf with bookmarks.txt.
Mateen Ulhaq
  • 681
  • 7
  • 13
  • How would one go about doing this recursively through several folders, and building the bookmarks from folder structure, i.e. sections/subsections depending on folder depth? – Cpt Reynolds Dec 27 '22 at 17:47
  • 1
    @CptReynolds For that, I would probably just write the "Generate bookmarks file" section in Python. – Mateen Ulhaq Jan 02 '23 at 22:41
5

A function I use all the time to do exactly this. Just make sure the pdfs sort properly in sequence in the expansion.

tp="/tmp/tmp.pdf"
td="/tmp/data"
for i in *.pdf; do
    echo "Bookmarking $i"
    printf "BookmarkBegin\nBookmarkTitle: %s\nBookmarkLevel: 1\nBookmarkPageNumber: 1\n" "${i%.*}"> "$td"
    pdftk "$i" update_info "$td" output "$tp"
    mv "$tp" "$i"
done
pdftk *.pdf cat output myBook.pdf
roaima
  • 107,089
  • 14
  • 139
  • 261
bu5hman
  • 4,663
  • 2
  • 14
  • 29