To scan books containing just text, black&white images and clear borders, the workflow I've been using is:
- digitize the source using a camera or just a scanner
- use
scantailor - finally use
djvubindto make a small (1-7 MB) djvu-file with ocr background
This works fine. However if you have magazines or books containing lots of colors in the images, structural elements, backgrounds, or images which overlap the margins of the page, using scantailor (in mixed mode) becomes very difficult, and you have to proceed manually with every single page.
So, what would be a good workflow in Linux to digitize such sources and get a small djvu or pdf file with ocr background?