3

I'm trying to parse text from a hosted image, but it looks like I've miss-configured Tesseract. I'm using Debian Buster, tesseract-ocr, libtesseract-dev and a Ruby wrapper are installed.

#  $ tesseract -v
tesseract 4.0.0
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found SSE

Inside a terminal tesseract <URL.png> output returns Error, cannot read input <URL.png>: No such file or directory. The same error message is raised using the Ruby gem.

Did I miss something after installing the packages ? The doc talks about manually placing the traneddata directory on Ubuntu, should it also be done on Debian ?

The traineddata is currently not shipped with the snap package and must be placed manually to ~/snap/tesseract/current.

I can get it working by using curl and local path as argument, but it should support URL as argument

Thanks


EDIT

I've tested both v4.1.1 and v5.0.0 by following these instructions and setting up tessdata directory. They both explicity returns that they don't support URLs:

Tesseract Open Source OCR Engine v5.0.0-alpha-647-g4a00 with Leptonica
Error, this tesseract has no URL support
Error during processing.

I'm obviously missing something because release notes says it supports URL since 4.1.1

Sumak
  • 253
  • 1
  • 7
  • 1
    it looks like it's been added to 4.1.1. I'm about to test it and edit my question https://tesseract-ocr.github.io/tessdoc/ReleaseNotes.html – Sumak May 06 '20 at 18:14
  • I have tesseract 4.1.1 installed via `apt`, and it still says "Error, this tesseract has no URL support". Which is odd, as the error message suggests the program knows that some tesseracts do... – jrochkind Mar 01 '23 at 20:32

0 Answers0