How to re-convert the text file produced by pdftotext to pdf again?

Question

Poppler has the excellent tool pdftotext for converting a pdf file to a text file:

pdftotext input.pdf output.txt

Is there a way to re-convert this text file to pdf?

By conversion, I mean to obtain a pdf file with a similar page content as the original pdf file.

If possible, with the same page numbering as the original (but this is not mandatory). A pdf without page numbering would be also fine.

Exact looking is not important.

Some potential use-case scenarios:

You have accidentally deleted your pdf file but you have that text file from pdftotext.
You would like to edit the text file by a text editor and to produce an updated version of your pdf file.
To produce a pdf file with smaller size.

score 1 · Answer 1 · answered Aug 15 '22 at 15:32

There are a lot of options. Theoretically any program that can read plain text and can print can print to a virtual printer that yields a PDF.

But if I were doing it programmatically, I'd probably use pandoc:

pandoc filename.txt -o output.pdf

The default uses pdflatex to create the PDF, but if you don't want to install something as heavy as a TeX distribution, there are other backends to use like weasyprint or wkhtmltopdf:

pandoc --pdf-engine weasyprint filename.txt -o output.pdf

But of course the result is never going to preserve the formatting, fonts, etc., of the original, as already pointed out.

score 0 · Answer 2 · answered Aug 15 '22 at 14:45

0

Similar to the program a2ps I use a Bash function a2pdf:

a2pdf () 
{ 
    lowriter --headless --convert-to pdf "$1"
}

You surely know that with pdftotext all properties of the PDF like fonts, formatting and links are lost.

answered Aug 15 '22 at 14:45

Erich

335
1
10

How to re-convert the text file produced by pdftotext to pdf again?

2 Answers2