8

what is the simplest tool to convert markdown to pdf in commandline?

I have found howtos where people suggest to use pandoc, but the required packages need gigabytes of dependencies:

apt-get install pandoc texlive-latex-base texlive-fonts-recommended texlive-extra-utils texlive-latex-extra

is there some minimal tool that can convert simple markdown to pdf in commandline, that does not require tons of dependencies?

I am using Debian 10.

400 the Cat
  • 819
  • 4
  • 37
  • 85
  • I'd probably try to pipe the ouput of markdown to html2ps ( http://web.mit.edu/outland/share/lib/html2ps/html2ps.html ) then pipe the output to cups-pdf. (could possibly need some character encoding conversion in between doable via iconv) – MC68020 Jul 24 '22 at 07:51

1 Answers1

8

All of the methods presented here still use pandoc in some kind of way, as the gigabytes of downloads mentioned in the question come from the LaTeX (texlive-*) packages in the given apt-get command – none of these are required. The pandoc package itself has a download size of ~17MB, which may or may not be acceptable for your use-case.

If you really, really don't want to use pandoc, then you could use lunamark in its place. It's similar to pandoc (both tools share the same author) but lunamark is written in Lua, a very small and lean language with a small footprint. There is no Debian package available though, you'd have to build it yourself. But, as stated before, the main issue is PDF creation: all good PDF libraries must do font handling, which usually requires heavy libraries to be available.

I'm not aware of any tool that would go directly from Markdown to PDF, the usual way is to go to an intermediary format first. The choice of that format determines your options.

  1. groff: GNU troff is an implementation of the troff text formatter. It's at the base of tools like man, is very fast, and can also produce nice looking PDF output. You'll need the groff and ghostscript packages, then call pandoc with

    pandoc --pdf-engine=pdfroff --output=out.pdf ...
    

    This is probably the solution that requires the fewest and smallest additional packages. Make sure though that apt-get won't install any unnecessary packages:

    apt-get install pandoc groff ghostscript --no-install-recommends
    

    On a freshly setup system this gives you

    Need to get 38.3 MB of archives.
    After this operation, 194 MB of additional disk space will be used.
    
  2. HTML: There are multiple HTML-to-PDF converters, and pandoc can use two (with current versions three) of these engines to go handle the conversion from Markdown to PDF. You'll have the choice between weasyprint, which is written in Python, and wkhtmltopdf, which is built on top of the webkit HTML engine that was used in Chromium. Install either of these and then use

     pandoc --pdf-engine=weasyprint
    

    or

     pandoc --pdf-engine=wkhtmltopdf
    

    Especially wkhtmltopdf may be a good choice if you already have many graphic and font package installed anyway. E.g., on a system that has the LXDE desktop environment installed, you'd see:

    % apt-get install pandoc wkhtmltopdf --no-install-recommends
    ... [omitted] ...
    Need to get 16.4 MB of archives.
    After this operation, 122 MB of additional disk space will be used.
    

    However, the impact would be much more substantial on a completely fresh system:

    Need to get 91.1 MB of archives.
    After this operation, 530 MB of additional disk space will be used.
    
  3. LaTeX with Docker: This method is really using LaTeX again, but instead of installing it on our system, we use a Docker image that contains pandoc and only the bare minimum of LaTeX packages, making it comparatively small. You'll need the docker.io package, then run this lengthy command:

    docker run --rm -v "$(pwd)":/data -u $(id -u):$(id -g) pandoc/latex --output=out.pdf ...
    

    The advantage of this is that you'll be using the latest pandoc and LaTeX versions, and it gives the nicest looking PDF (IMHO). However, the Docker image is still ~200MB in size, and Docker itself is also large (>90MB download size).

  4. LibreOffice: This method only makes sense if you already have LibreOffice installed, as it is a very large dependency. In that case, pandoc can be used to convert to odt or docx, which then can be converted to PDF with

     lowriter --headless --convert-to pdf intermediary.odt
    

I hope one of these suits your needs.

tarleb
  • 2,047
  • 11
  • 21
  • 1
    I think the idea of the original user was to _get rid of_ `pandoc`. – Kusalananda Jul 24 '22 at 07:50
  • 2
    @Kusalananda My understanding is that the many large `texlive-*` packages are a problem. The `pandoc` package is a mere `17MB` in download size. – tarleb Jul 24 '22 at 07:51
  • 1
    So, is there a way to install `pandoc` on Debian 10 without the unnecessary dependencies? Your alternatives seem to be various different ways of invoking `pandoc`, but you don't mention how to install `pandoc` (or an alternative to it) in a way that does not pull in "gigabytes of dependencies", which is the main point of the question. In fact, some alternatives seems to not only require `pandoc` but also other very heavy pieces of software. – Kusalananda Jul 24 '22 at 07:59
  • The reason why I include so many heavy pieces of software is that many of those are often already installed on a typical desktop machine. I agree that these are unsuitable when coming from a bare-bones system: the main packages are quite small, but the dependencies are heavy; hence the wording "good choice if you have the dependencies installed anyway". – tarleb Jul 24 '22 at 08:13