relon-unicode 0.1.0-rc2

Leaf Unicode tables, case-folding / normalization algorithms, and the glob matcher shared across Relon crates.
Documentation
  • Coverage
  • 82.83%
    82 out of 99 items documented0 out of 60 items with examples
  • Size
  • Source code size: 1.55 MB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 3.34 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 3s Average build duration of successful builds.
  • all releases: 3s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • Homepage
  • kookyleo/relon
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • kookyleo

Unicode-aware tables, algorithms, and the glob matcher shared by the tree-walk evaluator and the wasm-AOT / native codegen backends.

This crate is a leaf: it depends on no other relon-* crate (matching relon-util / relon-cap), so it sits at the very bottom of the workspace dep graph. It consolidates every Unicode dataset, the SIMD ASCII fast path, and the linear-time glob matcher that previously lived under relon-ir/src/unicode/ and relon-ir/src/glob.rs. Pulling them into a standalone crate lets relon-evaluator consume the shared tables without an edge to relon-ir (the evaluator is a tree-walk engine and never touches the IR surface), keeping the dep graph honest.

relon-ir keeps same-named re-exports so the codegen backends that reach for relon_ir::ascii_fold_simd / relon_ir::glob / etc. compile unchanged.

Module map

  • [case_folding] — UCD simple (1:1) upper / lower folding tables, generated at build time from char::to_uppercase / char::to_lowercase. Drives the wasm-AOT __casefold_lookup helper.
  • [full_case_folding] — UAX #21 full case folding (multi-codepoint mappings, Greek final sigma, Turkish / Azerbaijani locale overrides). Generated from data/SpecialCasing.txt via tools/gen_full_case_folding.py.
  • full_case_folding_data — raw generated tables for full_case_folding. Pulled in via include!() from full_case_folding.rs rather than declared as a sibling module, matching the pre-split layout so the generated symbols stay in a single namespace.
  • [combining_marks] — Mn + Mc + Me range table used by every case-fold body to decide whether a codepoint resets the word boundary.
  • [whitespace] — non-ASCII White_Space ranges (the ASCII subset is special-cased on the wasm fast path).
  • [normalization] — UAX #15 NFD / NFKD / NFC / NFKC algorithms on top of the [normalization_data] tables. UCD version pinned at 14.0.0; regenerate via tools/gen_normalization_tables.py.
  • [normalization_data] — generated UCD 14.0.0 decomposition, canonical-combining-class, and composition-pair tables.
  • [ascii_fold_simd] — v3++ item 4 SIMD ASCII fast path for the tree-walk upper / lower / title bodies. Only the wasm32 arm uses unsafe v128 intrinsics; other targets stay on the chunked scalar fallback.
  • [glob] — linear-time Unicode-aware glob matcher backing the glob_match(s, pattern) -> Bool stdlib function.

UCD version: Unicode 14.0.0 across every regeneration script. When a future Unicode bump lands, regenerate the four data-bearing siblings in one commit so the wasm-AOT data section and the tree-walk algorithm stay consistent.