12

What commands must I issue irreversibly to remove all metadata from foo.pdf? Assume embedded images are already clean.

I got the impression from

https://gist.github.com/hubgit/6078384

that

exiftool -all:all= foo.pdf
qpdf --linearize foo.pdf bar.pdf

might suffice, but it wasn't clear to me whether it was an entirely complete method. There was some talk of pdftk and an "info dictionary" that I didn't understand.

Toothrot
  • 3,255
  • 3
  • 24
  • 47
  • Does this answer your question? [Where is metadata for PDF files? Can I insert metadata into any PDF file?](https://unix.stackexchange.com/questions/489230/where-is-metadata-for-pdf-files-can-i-insert-metadata-into-any-pdf-file) – AdminBee Sep 09 '20 at 10:50
  • 2
    My answer using `pdftk` or `qpdf`: https://askubuntu.com/a/1300265/632237 – Freifrau von Bleifrei Dec 15 '20 at 11:16

3 Answers3

9

There is a tool called MAT (Metadata Anonymisation Toolkit) that can remove metadata from a number of different formats. In Ubuntu, this is how I use it:

sudo apt install mat2

mat2 filename.pdf  # you will now end up with a file called filename.cleaned.pdf

mat2 --inplace filename.pdf  # this will overwrite original file with the new one, effectively removing the original file

Note that the cleaned file will have a smaller or larger filesize compared to the original.

There is a lightweight mode (just add --lightweight) you can try if the cleaned image with the commands above produced significantly large files, as in pdf files. But this might leave some metadata. Use the exiftool command to find out if the cleaned image is still leaking any metadata.

GMaster
  • 5,992
  • 3
  • 28
  • 32
  • 1
    thanks. do you happen to know why this can multiply the original size by 9? – Toothrot Sep 09 '20 at 13:16
  • 3
    I never tried the tool on pdfs, mostly image files, and it works well on image files. But just tried now on a pdf, the size did increase considerably. It looks like the tool turns the pdf pages into images and then binds them together. – GMaster Sep 09 '20 at 13:26
  • 1
    To whom it may concern: `mat2` will (at least some times) change your PDF version (e. g. from 1.4 to 1.5) – twigmac May 20 '21 at 11:51
  • 2
    For my use-case, `mat2` increased the file from 1.1M to 4.4M. I do not suggest it. – Valerio Bozz May 29 '21 at 17:48
  • 1
    According to [the mat2 README](https://0xacab.org/jvoisin/mat2#notes-about-the-lightweight-mode), there is a "lightweight" mode (with `mat2 -L` or `--lightweight`) that prevents rendering the text as images, but *might* leave some metadata intact. – Fritz May 08 '23 at 12:44
  • @Fritz Thanks! I will update my answer. – GMaster May 08 '23 at 13:59
1
cpdf -remove-metadata input.pdf -o output.pdf

I haven't verified it myself, yet but cpdf is a very reliable to useful tool, available here, free for non-commercial use, and available as a precompiled binary for linux command line.

And the file size is smaller at the end ;)

AdminBee
  • 21,637
  • 21
  • 47
  • 71
Berry Tsakala
  • 151
  • 1
  • 5
  • 2
    It prints "This demo is for evaluation only. http://www.coherentpdf.com/", reduced file size by 2K, doesn't seem to change metadata: the author and creator software are still there. – Pro Backup Jul 07 '22 at 07:25
-1

Try Metadata Cleaner on flathub.

robertspierre
  • 271
  • 2
  • 10