Tesseract OCR
Download & Install Tesseract
Visit the Tesseract at UB Mannheim
Select the tesseract-ocr-w64-setup-v5.2.x.x.exe (64 bit) file to download the Tesseract executable installer
Once downloaded, open the executable file and follow the installation prompts
Make sure you have installed the tesseract-64bit in C:\Program Files\Tesseract-OCR
Trained Data Files (Languages)
You can download the .traineddata
file for the language you need and place it in Tesseract OCR installation directory C:\Program Files\Tesseract-OCR\tessdata
\[here]
(this should be the same as where the tessdata directory is installed)
tessdata https://github.com/tesseract-ocr/tessdata Speed : Faster than tessdata-best Accuracy : Slightly less accurate than tessdata-best
tessdata-best
(Recommended for video games)
https://github.com/tesseract-ocr/tessdata_best Speed : Slowest Accuracy : Most accurate
tessdata-fast https://github.com/tesseract-ocr/tessdata_fast Speed : Fastest Accuracy : Least accurate
Page Segmentation Modes
The PSM allows you to select a segmentation method dependent on your particular image and the environment in which it was captured
The number one reason I see budding OCR practitioners fail to obtain the correct OCR result is that they are using the incorrect page segmentation mode. To quote the Tesseract documentation, by default, Tesseract expects a page of text when it segments an input image (Improving the quality of the output).
That “page of text” assumption is so incredibly important. If you’re OCR’ing a scanned chapter from a book, the default Tesseract PSM may work well for you. But if you’re trying to OCR only a single line, a single word, or maybe even a single character, then this default mode will result in either an empty string or nonsensical results.
Troubleshooting
TESSDATA_PREFIX is not set to your tessdata directory
Run Command Prompt as administrator
type
setx TESSDATA_PREFIX "C:\Program Files\Tesseract-OCR\tessdata"
, and then press EnterRestart OS
Last updated