Unicode-aware tables, algorithms, and the glob matcher shared by the tree-walk evaluator and the wasm-AOT / native codegen backends.
This crate is a leaf: it depends on no other relon-* crate
(matching relon-util / relon-cap), so it sits at the very
bottom of the workspace dep graph. It consolidates every Unicode
dataset, the SIMD ASCII fast path, and the linear-time glob
matcher that previously lived under relon-ir/src/unicode/ and
relon-ir/src/glob.rs. Pulling them into a standalone crate lets
relon-evaluator consume the shared tables without an edge to
relon-ir (the evaluator is a tree-walk engine and never touches
the IR surface), keeping the dep graph honest.
relon-ir keeps same-named re-exports so the codegen backends
that reach for relon_ir::ascii_fold_simd / relon_ir::glob /
etc. compile unchanged.
Module map
- [
case_folding] — UCD simple (1:1) upper / lower folding tables, generated at build time fromchar::to_uppercase/char::to_lowercase. Drives the wasm-AOT__casefold_lookuphelper. - [
full_case_folding] — UAX #21 full case folding (multi-codepoint mappings, Greek final sigma, Turkish / Azerbaijani locale overrides). Generated fromdata/SpecialCasing.txtviatools/gen_full_case_folding.py. full_case_folding_data— raw generated tables forfull_case_folding. Pulled in viainclude!()fromfull_case_folding.rsrather than declared as a sibling module, matching the pre-split layout so the generated symbols stay in a single namespace.- [
combining_marks] — Mn + Mc + Me range table used by every case-fold body to decide whether a codepoint resets the word boundary. - [
whitespace] — non-ASCIIWhite_Spaceranges (the ASCII subset is special-cased on the wasm fast path). - [
normalization] — UAX #15 NFD / NFKD / NFC / NFKC algorithms on top of the [normalization_data] tables. UCD version pinned at 14.0.0; regenerate viatools/gen_normalization_tables.py. - [
normalization_data] — generated UCD 14.0.0 decomposition, canonical-combining-class, and composition-pair tables. - [
ascii_fold_simd] — v3++ item 4 SIMD ASCII fast path for the tree-walkupper/lower/titlebodies. Only the wasm32 arm usesunsafev128 intrinsics; other targets stay on the chunked scalar fallback. - [
glob] — linear-time Unicode-aware glob matcher backing theglob_match(s, pattern) -> Boolstdlib function.
UCD version: Unicode 14.0.0 across every regeneration script. When a future Unicode bump lands, regenerate the four data-bearing siblings in one commit so the wasm-AOT data section and the tree-walk algorithm stay consistent.