Understanding OCR and Improving Accuracy

This guide explains how OCR works in VNTranslator and provides practical tips to improve text recognition accuracy.

Note: This guide primarily focuses on traditional OCR engines (Tesseract OCR and Windows OCR). If you're using modern OCR engines like Fast OCR, LLM-based engines (Qwen 2.5 VL, GPT-4 Vision, Claude Vision), or cloud-based engines (Google Cloud Vision, Azure Cloud Vision), you can skip most pre-processing adjustments as these engines handle complex backgrounds and colored text automatically.

How OCR Works in VNTranslator

1. Screen Capture

The first step in the OCR process is capturing an image from the screen. The quality of the captured image significantly impacts the OCR engine's ability to recognize text accurately.

2. Pre-processing (Image Processing)

For Traditional OCR Engines Only.

Pre-processing is primarily needed when using Tesseract OCR or Windows OCR. Modern OCR engines like Fast OCR, LLM-based engines, and cloud-based engines can handle various text conditions without pre-processing adjustments.

During pre-processing, the image is adjusted to display black text on a white background. This contrast makes it easier for traditional OCR engines to recognize the text.

When to use pre-processing:

  • Using Tesseract OCR or Windows OCR

  • Game text has colored backgrounds

  • Low contrast between text and background

  • Need to improve recognition accuracy for traditional engines

When pre-processing is NOT needed:

  • Using Fast OCR or modern OCR engines

  • Using LLM-based engines (Qwen 2.5 VL, GPT-4 Vision, Claude Vision)

  • Using cloud-based engines (Google Cloud Vision, Azure Cloud Vision)

3. Selecting the OCR Engine

Text recognition accuracy depends heavily on the OCR engine you choose. VNTranslator supports three categories of OCR engines:

Traditional OCR Engines

  • Examples: Tesseract OCR, Windows OCR

  • Best for: Simple text with black text on white background

  • Limitations: May struggle with colored text or complex backgrounds

  • Requires: Pre-processing adjustments for better accuracy

Modern OCR Engines ⭐⭐⭐

  • Examples: Fast OCR, EasyOCR

  • Best for: Moderate background noise and multi-colored text

  • Advantages: Better handling of various text conditions without pre-processing

  • Requires: Minimal to no pre-processing

AI-based OCR Engines ⭐⭐⭐⭐⭐

  • Examples: Google Cloud Vision, Azure Cloud Vision, Qwen 2.5 VL, GPT-4 Vision, Claude Vision

  • Best for: Complex backgrounds, rotated text, and colored text

  • Advantages: High accuracy without pre-processing, handles various text conditions automatically

  • Requires: No pre-processing needed

For a complete comparison of OCR engines, see OCR Engines.

4. Post-processing

After the OCR engine processes the text, the result will be displayed. If recognition is inaccurate, you can make corrections during post-processing using Regular Expressions (RegExp) to refine the results.

Post-processing is useful for all OCR engine types to:

  • Remove unwanted characters

  • Fix common recognition errors

  • Format the output text


Tips for Improving OCR Accuracy

For Traditional OCR Engines (Tesseract, Windows OCR)

  1. Ensure high-quality image captures: The better the quality of the screen capture, the higher the accuracy of OCR. Avoid blurry or low-resolution images.

  2. Use effective pre-processing: Adjust the image to have high contrast (black text on white background) to make text recognition easier for the OCR engine.

  3. Select appropriate threshold settings: Experiment with threshold values in the pre-processing options to find the best setting for your game.

For Modern and AI-based OCR Engines

  1. Ensure high-quality image captures: Good capture quality still helps, but these engines are more forgiving with image quality.

  2. Skip pre-processing: Modern and AI-based OCR engines work best with the original image without pre-processing adjustments.

  3. Choose the right engine for your needs:

    • Use Fast OCR for offline, fast recognition with moderate accuracy

    • Use cloud-based engines for highest accuracy with complex text

    • Use LLM-based engines for maximum flexibility and accuracy

For All OCR Engine Types

  1. Utilize post-processing: If text recognition is incorrect or you want to remove specific characters, use RegExp during post-processing to refine the output.

  2. Position capture area correctly: Make sure the capture area covers only the text dialogue box to avoid capturing unnecessary elements.

  3. Test different engines: Try different OCR engines to find which works best for your specific game or visual novel.