Expand description
Text normalization for Japanese text processing
Provides utilities for normalizing text before morphological analysis, including Unicode normalization, character width conversion, and more.
Structs§
- Char
Type Counts - Count character types in text
- Normalizer
- Text normalizer with configurable options
Enums§
Functions§
- contains_
kanji - Check if text contains kanji
- is_
hiragana_ only - Check if text contains only hiragana
- is_
katakana_ only - Check if text contains only katakana
- normalize_
punctuation - Convert all Japanese periods and commas to standard forms
- normalize_
quotes - Normalize Japanese quotes and brackets
- remove_
whitespace - Remove all whitespace
- to_nfkc
- Normalize to NFKC (compatibility composition)