Post-processing

Post-processing refines the OCR output after the text has been recognized. This step helps correct common OCR errors, remove unwanted characters, and format the text properly before translation.

Note: Post-processing is useful for all OCR engine types. Even modern and AI-based OCR engines may produce text that needs formatting or correction.

When to Use Post-processing

Use post-processing when:

OCR recognizes wrong characters consistently ("l" as "|", "0" as "O")
You need to remove specific characters or symbols
Text formatting needs adjustment. (line breaks, quotation marks)
You want to standardize character patterns
OCR output contains unwanted characters

Regular Expression (RegExp)

Regular Expressions (RegExp) are patterns used to search and manipulate text. VNTranslator supports two types of RegExp operations:

1. RegExp Matching

Identifies and extracts specific text patterns from the OCR output. Only text that matches the pattern will be kept.

Use cases:

Extract only Japanese characters and ignore other symbols
Keep only specific language characters
Remove everything except the main dialogue text

Example:

This pattern matches and extracts only Japanese characters (Kanji, Hiragana, Katakana, and Japanese symbols).

["[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[々〆〤]+|[⺀-⿕]+|[、-〿]+|[ㇰ-ㇿ㈠-㉃㊀-㍿]+", "gmu"]

For more details, see RegExp Matching.

2. RegExp Replacement (Search & Replace)

Searches for specific text patterns and replaces them with other text. This is the most commonly used post-processing technique.

Use cases:

Fix common OCR recognition errors
Replace wrong quotation marks with correct ones
Remove unwanted characters or symbols
Normalize text formatting
Fix line breaks and spacing issues

Common Examples:

Replace quotation marks:

["『", "g", "「"]
["』", "g", "」"]

Remove music symbols:

["♪", "g", ""]

Fix ellipsis:

["。。。", "g", "..."]

Remove line breaks:

["(\r\n|\n|\r)", "gm", " "]

Fix common OCR errors:

["\\|", "g", "I"]

For more details, see RegExp Replacement.

PreviousPre-processing NextUnderstanding OCR and Improving Accuracy

hashtagWhen to Use Post-processing

hashtagRegular Expression (RegExp)

hashtag1. RegExp Matching

hashtag2. RegExp Replacement (Search & Replace)

When to Use Post-processing

Regular Expression (RegExp)

1. RegExp Matching

2. RegExp Replacement (Search & Replace)