# Post-processing

Post-processing refines the OCR output after the text has been recognized. This step helps correct common OCR errors, remove unwanted characters, and format the text properly before translation.

**Note:** Post-processing is useful for **all OCR engine types**. Even modern and AI-based OCR engines may produce text that needs formatting or correction.

### **When to Use Post-processing**

Use post-processing when:

* OCR recognizes wrong characters consistently ("l" as "|", "0" as "O")
* You need to remove specific characters or symbols
* Text formatting needs adjustment. (line breaks, quotation marks)
* You want to standardize character patterns
* OCR output contains unwanted characters

### Regular Expression (RegExp)

Regular Expressions (RegExp) are patterns used to search and manipulate text. VNTranslator supports two types of RegExp operations:

#### 1. RegExp Matching

Identifies and extracts specific text patterns from the OCR output. Only text that matches the pattern will be kept.

**Use cases:**

* Extract only Japanese characters and ignore other symbols
* Keep only specific language characters
* Remove everything except the main dialogue text

**Example:**

This pattern matches and extracts only Japanese characters (Kanji, Hiragana, Katakana, and Japanese symbols).

{% code overflow="wrap" %}

```
["[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[々〆〤]+|[⺀-⿕]+|[、-〿]+|[ㇰ-ㇿ㈠-㉃㊀-㍿]+", "gmu"]
```

{% endcode %}

For more details, see [RegExp Matching](/advanced/regexp/matching.md).

#### 2. RegExp Replacement (Search & Replace)

Searches for specific text patterns and replaces them with other text. This is the most commonly used post-processing technique.

**Use cases:**

* Fix common OCR recognition errors
* Replace wrong quotation marks with correct ones
* Remove unwanted characters or symbols
* Normalize text formatting
* Fix line breaks and spacing issues

**Common Examples:**

Replace quotation marks:

```
["『", "g", "「"]
["』", "g", "」"]
```

Remove music symbols:

```
["♪", "g", ""]
```

Fix ellipsis:

```
["。。。", "g", "..."]
```

Remove line breaks:

```
["(\r\n|\n|\r)", "gm", " "]
```

Fix common OCR errors:

```
["\\|", "g", "I"]
```

For more details, see [RegExp Replacement](/advanced/regexp/replacement.md).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.vntranslator.com/user-guide/ocr/post-processing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
