Module normalization_data

Expand description

AUTO-GENERATED — do not edit by hand.

Generated from the Unicode Character Database (UCD) version 14.0.0:

Regenerate with:

python3 crates/relon-unicode/tools/gen_normalization_tables.py \
    > crates/relon-unicode/src/normalization_data.rs

Last regenerated: 2026-05-18 (UCD 14.0.0).

Bump procedure when a new UCD ships:

Drop the new *.txt UCD files where the script expects them (see gen_normalization_tables.py for the exact paths).
Re-run the script. The script bakes multi-level decomposition, filters out Full_Composition_Exclusion, and excludes Hangul syllables — none of that needs manual fix-up.
Run cargo test -p relon-ir unicode to confirm round-trip conformance.

Hangul syllables (U+AC00..=U+D7A3) are decomposed and composed algorithmically per UAX #15 §16 — keeping them out of the tables saves ~88 KB.

Statics§

CCC_TABLE: Canonical_Combining_Class, sparse (only non-zero entries). Sorted by code point. Lookup falls back to 0 when absent.
COMPOSITION_PAIRS: Canonical composition pairs, sorted by (first, second). Excludes any pair whose composite has Full_Composition_Exclusion = True or appears in CompositionExclusions.txt. Hangul composition runs through its own algorithmic helper.
NFD_INDEX: Sorted by code point. Each entry is (cp, payload_offset, payload_len). payload_offset indexes into NFD_POOL. Hangul syllables are excluded; callers must run the algorithmic decompose first.
NFD_POOL
NFKD_INDEX
NFKD_POOL