Post-processing

Post-processing in OCR (Optical Character Recognition) refers to the additional steps or techniques applied to the output of an OCR system in order to improve the accuracy and quality of the recognize

1. Regular Expression (RegExp)

Regular Expressions are patterns used to match character combinations in strings

RegExp Matching

Recognizing specific sequences or patterns of characters in the text and ensuring that the recognized text conforms to a specific format or structure

Example:

["[δΈ€-ιΎ ]+|[ぁ-γ‚”]+|[γ‚‘-ヴー]+|[々〆 ]+|[βΊ€-βΏ•]+|[、-γ€Ώ]+|[γ‡°-γ‡Ώγˆ -γ‰ƒγŠ€-㍿]+", "gmu"]
pageMatching

RegExp Replacement (Search & Replace)

Searching for specific patterns of text and replacing them with text or another pattern

Example:

["γ€Ž", "g", "γ€Œ"]
["』", "g", "」"]
["β™ͺ", "g", ""]
["。。。", "g", "..."]
["(\r\n|\n|\r)", "gm", " "]
["\\|", "g", "I"]
pageReplacement

2. Spell Checker 🚧

Last updated