I'm playing around a bit with OCR software, in particular I'm spending a bit of time with tesseract. I got it to where I can load an image and get tesseract to rip the text from the image, in Linux terminal. I'm now trying to figure out how I can automatically save that ripped text to pdf, odf, txt and word formats, from the terminal.
Asked
Active
Viewed 154 times
1 Answers
1
Looking at man 1 tesseract, it seems you can make it save its output in one or more specific formats using a command in the form:
tesseract image_file output_file pdf txt
where the four arguments play the role of FILE, OUTPUTBASE and CONFIGFILE (repeated twice), respectively, in the general command synopsis. This command creates two files, output_file.pdf and output_file.txt.
fra-san
- 9,931
- 2
- 21
- 42