Standard workflow to digitize magazines or books using OCR while minimizing file size?

Asked Jun 07 '14 at 10:01

Active Aug 02 '14 at 07:42

Viewed 181 times

To scan books containing just text, black&white images and clear borders, the workflow I've been using is:

digitize the source using a camera or just a scanner
use scantailor
finally use djvubind to make a small (1-7 MB) djvu-file with ocr background

This works fine. However if you have magazines or books containing lots of colors in the images, structural elements, backgrounds, or images which overlap the margins of the page, using scantailor (in mixed mode) becomes very difficult, and you have to proceed manually with every single page.

So, what would be a good workflow in Linux to digitize such sources and get a small djvu or pdf file with ocr background?

edited Jun 09 '14 at 01:58

asked Jun 07 '14 at 10:01

student

17,875
31
103
169

Standard workflow to digitize magazines or books using OCR while minimizing file size?

0 Answers0