0

I'm playing around a bit with OCR software, in particular I'm spending a bit of time with tesseract. I got it to where I can load an image and get tesseract to rip the text from the image, in Linux terminal. I'm now trying to figure out how I can automatically save that ripped text to pdf, odf, txt and word formats, from the terminal.

Neil Meyer
  • 129
  • 2
  • 7

1 Answers1

1

Looking at man 1 tesseract, it seems you can make it save its output in one or more specific formats using a command in the form:

tesseract image_file output_file pdf txt

where the four arguments play the role of FILE, OUTPUTBASE and CONFIGFILE (repeated twice), respectively, in the general command synopsis. This command creates two files, output_file.pdf and output_file.txt.

fra-san
  • 9,931
  • 2
  • 21
  • 42