# Post-processing

Post-processing refines the OCR output after the text has been recognized. This step helps correct common OCR errors, remove unwanted characters, and format the text properly before translation.

**Note:** Post-processing is useful for **all OCR engine types**. Even modern and AI-based OCR engines may produce text that needs formatting or correction.

### **When to Use Post-processing**

Use post-processing when:

* OCR recognizes wrong characters consistently ("l" as "|", "0" as "O")
* You need to remove specific characters or symbols
* Text formatting needs adjustment. (line breaks, quotation marks)
* You want to standardize character patterns
* OCR output contains unwanted characters

### Regular Expression (RegExp)

Regular Expressions (RegExp) are patterns used to search and manipulate text. VNTranslator supports two types of RegExp operations:

#### 1. RegExp Matching

Identifies and extracts specific text patterns from the OCR output. Only text that matches the pattern will be kept.

**Use cases:**

* Extract only Japanese characters and ignore other symbols
* Keep only specific language characters
* Remove everything except the main dialogue text

**Example:**

This pattern matches and extracts only Japanese characters (Kanji, Hiragana, Katakana, and Japanese symbols).

{% code overflow="wrap" %}

```
["[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[々〆〤]+|[⺀-⿕]+|[、-〿]+|[ㇰ-ㇿ㈠-㉃㊀-㍿]+", "gmu"]
```

{% endcode %}

For more details, see [RegExp Matching](https://docs.vntranslator.com/advanced/regexp/matching).

#### 2. RegExp Replacement (Search & Replace)

Searches for specific text patterns and replaces them with other text. This is the most commonly used post-processing technique.

**Use cases:**

* Fix common OCR recognition errors
* Replace wrong quotation marks with correct ones
* Remove unwanted characters or symbols
* Normalize text formatting
* Fix line breaks and spacing issues

**Common Examples:**

Replace quotation marks:

```
["『", "g", "「"]
["』", "g", "」"]
```

Remove music symbols:

```
["♪", "g", ""]
```

Fix ellipsis:

```
["。。。", "g", "..."]
```

Remove line breaks:

```
["(\r\n|\n|\r)", "gm", " "]
```

Fix common OCR errors:

```
["\\|", "g", "I"]
```

For more details, see [RegExp Replacement](https://docs.vntranslator.com/advanced/regexp/replacement).
