Questions tagged [tesseract]

Tesseract is an OCR (optical character recognition) engine

Tesseract is an (optical character recognition) engine. It is open-source and available on most Unix variants. It supports many scripts, including Latin, Greek, Cyrillic, Hindi, Tamil, Chinese, Japanese, Korean, Thai, Arabic and Hebrew.

External links

14 questions
10
votes
2 answers

Tesseract: High CPU Usage and slow speed, only when running multiple processes in parallel

Problem pytesseract.image_to_string() takes too much time when I run the script through supervisordd, but executes almost instantaneously when run directly in shell (on the same server and simultaneously with supervisor scripts). Apart from taking…
Ashish
  • 270
  • 1
  • 2
  • 10
5
votes
1 answer

tesseract: is it possible to change font output in OCRed pdf?

Following up on how to OCR a pdf file and get the text stored within pdf? I have successfully produced OCRed pdf pages. In Evince, however, the letters are not shown; by this I mean that I cannot see the characters, but I can select them, copy them…
ingli
  • 1,665
  • 1
  • 15
  • 33
3
votes
0 answers

Debian Buster: Tesseract not supporting URL as argument

I'm trying to parse text from a hosted image, but it looks like I've miss-configured Tesseract. I'm using Debian Buster, tesseract-ocr, libtesseract-dev and a Ruby wrapper are installed. # $ tesseract -v tesseract 4.0.0 leptonica-1.76.0 libgif…
Sumak
  • 253
  • 1
  • 7
2
votes
0 answers

OCR high res images & combine OCR data later, after image compression?

I have a large number of .tif's coming out of ScanTailor. Is there a way that I might OCR those .tif's with tesseract, holding the OCR data separate from the images; then compress the images, and finally combine the OCR data with the compressed…
Diagon
  • 600
  • 4
  • 13
2
votes
1 answer

Where I can get Tesseract binaries for Debian 6 64bit?

I used apt-get to install Tesseract but it's not really working. Maybe I could just download binaries somewhere, put in a dir and use this way? What's wrong with my Tesseract now: tesseract --help tesseract:Error:Usage:tesseract imagename outputbase…
buikoto
  • 21
  • 2
1
vote
0 answers

Is there software to manually OCR / teach OCR for handwriting (non-english) texts?

I had a problem that can't solve Tesseract/Abbyy Finereader etc - they can't recognize handwriting Russian as example. So I search OCR software for such things or a way to manually OCR my pdfs (create layers, draw squares, fill it with text by…
PDD
  • 11
  • 2
1
vote
0 answers

script run via keyboard binding does not write to file

Following bash script interprets text in an image file and writes to a .txt file. #!/usr/bin/env bash LD_LIBRARY_PATH="/usr/local/lib" export LD_LIBRARY_PATH /usr/local/bin/tesseract /home/martin/work/textpic.png…
MyrionSC2
  • 111
  • 3
0
votes
1 answer

Best command-line OCR software for recognizing typed text over colorful background

I need to extract text from images like the one below: As you can see, the text is typed not handwritten. Moreover, the background is colorful. I've tried Tesseract OCR, and while it works some of the times, it fails miserably on some inputs. For…
user549392
0
votes
1 answer

Tesseract doesn't accept process substitution

I'm making a quick script that is supposed to use OCR tool (tesseract) on image in clipboard to convert it to text and output it. It looks like this: #!/bin/sh temp="$(mktemp tmpXXX.png)" xclip -selection clipboard -t image/png -o > $temp tesseract…
Fedja M.
  • 105
  • 8
0
votes
1 answer

Scripting tesseract for file manager context menu

File manager context menu scripts sometimes do the job far quicker than using a GUI utility. So I've been using dozens of simple and more complex scripts for a long time in file managers Dolphin, Nautilus and Nemo, although I have elementary level…
Sadi
  • 441
  • 6
  • 19
0
votes
0 answers

Using tesseract for character recongniton, result is not as expected (much worse). How to get better?

I wanted to add output of Linux boot to my question and decided to try to use optical character recognition thinking now in 2022 surely there should be decent open source options (have not tried OCR for a long time). Links found via Web search…
Martian2020
  • 1,039
  • 7
  • 20
0
votes
1 answer

How do you save the text in the terminal to various text formats?

I'm playing around a bit with OCR software, in particular I'm spending a bit of time with tesseract. I got it to where I can load an image and get tesseract to rip the text from the image, in Linux terminal. I'm now trying to figure out how I can…
Neil Meyer
  • 129
  • 2
  • 7
0
votes
1 answer

Install tesseract offline in RHEL

I have an RHEL based server that does not connect to the internet. I need to install Tesseract >4.0 on this server. Therefore, my option was to download RPM packages from another and move them to the server and install using rpm command. I have used…
Sathindu
  • 101
  • 2
0
votes
1 answer

Leptonica compilation error

Trying to install leptonica v1.78 on Ubuntu 16, but it's not working for some reason. After running ./configure and make, I keep getting this error: make[2]: Entering directory '/home/user/Documents/leptonica/leptonica-1.78.0/prog' CC …
Gyakenji
  • 101
  • 1