How do you save the text in the terminal to various text formats?

Question

I'm playing around a bit with OCR software, in particular I'm spending a bit of time with tesseract. I got it to where I can load an image and get tesseract to rip the text from the image, in Linux terminal. I'm now trying to figure out how I can automatically save that ripped text to pdf, odf, txt and word formats, from the terminal.

score 1 · Accepted Answer · answered Mar 09 '21 at 10:50

Looking at man 1 tesseract, it seems you can make it save its output in one or more specific formats using a command in the form:

tesseract image_file output_file pdf txt

where the four arguments play the role of FILE, OUTPUTBASE and CONFIGFILE (repeated twice), respectively, in the general command synopsis. This command creates two files, output_file.pdf and output_file.txt.

How do you save the text in the terminal to various text formats?

1 Answers1