Expand description
AUTO-GENERATED — do not edit by hand.
Generated from the Unicode Character Database (UCD) version 14.0.0:
UnicodeData.txt— decomposition mapping + CCCDerivedNormalizationProps.txt— Full_Composition_ExclusionCompositionExclusions.txt— explicit exclusion list
Regenerate with:
python3 crates/relon-unicode/tools/gen_normalization_tables.py \
> crates/relon-unicode/src/normalization_data.rsLast regenerated: 2026-05-18 (UCD 14.0.0).
Bump procedure when a new UCD ships:
- Drop the new
*.txtUCD files where the script expects them (seegen_normalization_tables.pyfor the exact paths). - Re-run the script. The script bakes multi-level decomposition,
filters out
Full_Composition_Exclusion, and excludes Hangul syllables — none of that needs manual fix-up. - Run
cargo test -p relon-ir unicodeto confirm round-trip conformance.
Hangul syllables (U+AC00..=U+D7A3) are decomposed and composed algorithmically per UAX #15 §16 — keeping them out of the tables saves ~88 KB.
Statics§
- CCC_
TABLE - Canonical_Combining_Class, sparse (only non-zero entries). Sorted by code point. Lookup falls back to 0 when absent.
- COMPOSITION_
PAIRS - Canonical composition pairs, sorted by
(first, second). Excludes any pair whose composite has Full_Composition_Exclusion = True or appears in CompositionExclusions.txt. Hangul composition runs through its own algorithmic helper. - NFD_
INDEX - Sorted by code point. Each entry is
(cp, payload_offset, payload_len).payload_offsetindexes intoNFD_POOL. Hangul syllables are excluded; callers must run the algorithmic decompose first. - NFD_
POOL - NFKD_
INDEX - NFKD_
POOL