Module normalize

Module normalize 

Source
Expand description

Text normalization for Japanese text processing

Provides utilities for normalizing text before morphological analysis, including Unicode normalization, character width conversion, and more.

Structs§

CharTypeCounts
Count character types in text
Normalizer
Text normalizer with configurable options

Enums§

CharWidth
Character width for conversion
NormForm
Unicode normalization form

Functions§

contains_kanji
Check if text contains kanji
is_hiragana_only
Check if text contains only hiragana
is_katakana_only
Check if text contains only katakana
normalize_punctuation
Convert all Japanese periods and commas to standard forms
normalize_quotes
Normalize Japanese quotes and brackets
remove_whitespace
Remove all whitespace
to_nfkc
Normalize to NFKC (compatibility composition)