49

I would like to install a command line tool within a Docker image in order to quickly convert *html files into *pdf files.

I am surprised there is not a Unix tool to do something like this.

terdon
  • 234,489
  • 66
  • 447
  • 667
EB2127
  • 663
  • 2
  • 6
  • 7
  • 2
    @muru It's arguable a duplicate, though (A) I'm looking for a command line tool to put in a Docker image and (B) the answers below are quite useful and more helpful that the posting above from 2015. I've edited the question to clarify this somewhat, and I'm happy to edit again. – EB2127 Aug 05 '19 at 04:26
  • 2
    Yes, this question is focused on command line tools while the other isn't and also, the other requires a more complex solution since it's about converting multiple, linked html documents. I don't think it's a dupe. – terdon Aug 05 '19 at 16:21
  • 1
    [html2pdf](https://github.com/spipu/html2pdf) – Barmar Aug 05 '19 at 21:06
  • 1
    this should probably be on [softwarerecs.se] – phuclv Aug 06 '19 at 16:47
  • @phuclv This is a good point. I didn't know this existed. – EB2127 Aug 06 '19 at 22:20
  • $ libreoffice --headless --norestore --convert-to pdf:writer_pdf_Export MY_HTML_FILE.html – Francisco Luz Mar 20 '22 at 15:37
  • https://stackoverflow.com/questions/48602393/how-to-convert-modern-html-to-pdf – Tim Abell Jul 19 '23 at 20:52

6 Answers6

42

pandoc is a great command-line tool for file format conversion.

The disadvantage is for PDF output, you’ll need LaTeX. The usage is

pandoc test.html -t latex -o test.pdf

If you don't have LaTeX installed, then I recommend htmldoc.


Cited from Creating a PDF

By default, pandoc will use LaTeX to create the PDF, which requires that a LaTeX engine be installed.

Alternatively, pandoc can use ConTeXt, pdfroff, or any of the following HTML/CSS-to-PDF-engines, to create a PDF: wkhtmltopdf, weasyprint or prince. To do this, specify an output file with a .pdf extension, as before, but add the --pdf-engine option or -t context, -t html, or -t ms to the command line (-t html defaults to --pdf-engine=wkhtmltopdf).

  • 11
    +1. pandoc can also use `wkhtmltopdf` to directly convert from html to pdf, without needing latex. see `man pandoc` and search for `wkhtmltopdf` or `--pdf-engine` – cas Aug 05 '19 at 04:15
  • 1
    @cas This is really useful. Could you answer the question with that command? I would like to keep this answer – EB2127 Aug 05 '19 at 04:28
  • @EB2127 Stack Exchange answers can easily contain more than one solution to a problem; collaborative editing can/should make any answer better. – Jeff Schaller Aug 06 '19 at 11:05
  • @cas Unfortunately `wkhtmltopdf` complains about `QXcbConnection: Could not connect to display localhost:12.0` and dumps core. I suspect if I figure out the display issue, then it will work but not sure why it cares about the display. – steveb Mar 12 '20 at 23:19
  • What advantage is there to using pandoc with the WeasyPrint engine vs just using WeasyPrint without the dependency on pandoc? – Hashim Aziz Jun 04 '20 at 23:12
  • `Try running pandoc with --latex-engine=xelatex. pandoc: Error producing PDF' document contains bangla test also – alhelal May 04 '22 at 17:19
31

You can also try wkhtmltopdf, usage and installation is pretty straightforward.

guitarman
  • 419
  • 3
  • 2
20

weasyprint is an option. A possible drawback is that you'll need python on your machine.

Install:

pip install weasyprint

Convert:

weasyprint in.html out.pdf
steveb
  • 173
  • 5
shiftas
  • 326
  • 1
  • 6
3

I've been successfully using the 1.8 branch of HTMLDOC for years. I put it in a commercial system that has since generated hundreds of thousands of reports since 2003.

It's not super-versatile, but it is very efficient and reliable. It's limited to a basic set of postscript fonts.

It does not support CSS, but instead uses a special HTML comment directive set to control PDF specific aspects.

The source code is not too difficult to read and edit if you need to add custom facilities, if you're comfortable with C. It is compiled with GCC or Visual Studio, depending on your target platform.

Note that the HTML does not need to be in a file. You can generate it dynamically from a URL, php or aspx etc. You can also hook it up in your web server for generate a PDF file dynamically.

In my use case it generates a PDF file from an asp page which then gets attached to an email, instead of sending the HTML to the printer and the letter stuffing machine; it's a kind of print spooler.

birdwes
  • 27
  • 4
  • 1
    Indeed, a small and usefull tool, with lots of features. Thank you for sharing! – Andrei B Apr 29 '20 at 11:54
  • Absolutely useful tool HTMLDOC, we have done some custom changes and have been using it for almost 2 decades. The main issue with HTMLDOC is not supporting CSS and SVG. I tried many open-source options but had no success, at one point in time WKHTMLTOPDF was promising but now the project is archived in 2020, and there are more than 1.3K issues open. I think good dev support will be required to revive this project. I still believe that WK can be a very good substitute for HTMLDOC. Thanks! – ppant Apr 05 '23 at 11:38
3

There is also an html2ps program, and you could then easily convert the PostScript file to pdf. I used this several years ago, and IIRC it did a pretty good job on a large manual.

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
jamesqf
  • 185
  • 6
2

PhantomJS can do the job for you. It has command line functionality and works out of the box. You'll be required to write a simple Javascript function to tell it what to do. The site has a quick start guide and there are plenty of articles online to assist you. Usage is generally as follows:

phantomjs configFile.js htmlFile.html output.pdf

Here is a sample script to generate an A4 portrait PDF taken from here, save as your configFile.js

var page = require('webpage').create(),
    system = require('system'),
    fs = require('fs');

page.paperSize = {
    format: 'A4',
    orientation: 'portrait',
    margin: {
        top: "1.5cm",
        bottom: "1cm"
    },
    footer: {
        height: "1cm",
        contents: phantom.callback(function (pageNum, numPages) {
            return '' +
                '<div style="margin: 0 1cm 0 1cm; font-size: 0.65em">' +
                '   <div style="color: #888; padding:20px 20px 0 10px; border-top: 1px solid #ccc;">' +
                '       <span>REPORT FOOTER</span> ' +
                '       <span style="float:right">' + pageNum + ' / ' + numPages + '</span>' +
                '   </div>' +
                '</div>';
        })
    }
};

page.settings.dpi = "96";

page.content = fs.read(system.args[1]);

var output = system.args[2];

window.setTimeout(function () {
    page.render(output, {format: 'pdf'});
    phantom.exit(0);
}, 2000);
Kusalananda
  • 320,670
  • 36
  • 633
  • 936
The Betpet
  • 121
  • 4