Expand description
Unicode normalization and language-agnostic tokenization utilities.
Provides:
Script: Unicode script detection for individual characters.UnicodeNormalizer: NFC/NFD normalization, accent stripping, case folding.- Language-agnostic tokenization that handles CJK character segmentation.
Structs§
- Unicode
Normalizer - Unicode-aware text normalizer.
- Unicode
Normalizer Config - Configuration for
UnicodeNormalizer.
Enums§
Functions§
- detect_
script - Detect the
Scriptfor a single Unicode character.