# oxitext-shape TODO
## Status
Swash-based text shaper with swappable backend trait. Default `SwashShaper` wraps swash's `ShapeContext` for LTR Latin shaping. `ShapeBackend` trait enables alternative backends. Optional `RustybuzzShaper` behind `rustybuzz-backend` feature provides HarfBuzz-compatible shaping via rustybuzz. ~100 SLOC (lib.rs) + ~100 SLOC (backend.rs). Functional for simple text but missing complex script support, OpenType feature control, and script itemization.
## Core Implementation
- [x] Add script-aware itemization: split input text into runs by Unicode script (Latin, Arabic, Devanagari, Han, etc.) before shaping each run separately (~80 SLOC)
- [x] Add OpenType feature control: `ShapeOptions { features: Vec<([u8;4], bool)> }` for enabling/disabling liga, kern, smcp, etc. (~40 SLOC)
- **Goal:** `ShapeFeature{tag:[u8;4], value:u32}` for OpenType feature tags. `ShapeBackend::shape_with_features(font, text, size, features:&[ShapeFeature])->Result<Vec<ShapedGlyph>,_>`. Swash: `shaper.features()`; Rustybuzz: `buffer.add_feature()`.
- **Files:** `crates/oxitext-shape/src/backend.rs`, `crates/oxitext-shape/src/lib.rs`
- **Tests:** liga=0 suppresses ligature shaping (more output glyphs than liga=1); smcp feature produces small caps GIDs
- [x] Add language tagging: pass BCP-47 language tag to the shaper for language-specific GSUB/GPOS rules (~20 SLOC)
- **Goal:** `ShapeRequest{text, font, size, direction, script:Option<Script>, language:Option<Language>, features}` builder. `SwashShaper::shape_request(req:&ShapeRequest)->Result<Vec<ShapedGlyph>,_>`. Script/language hints fed to swash/rustybuzz for correct GSUB/GPOS selection.
- **Files:** `crates/oxitext-shape/src/lib.rs`, `crates/oxitext-shape/src/backend.rs`
- **Tests:** same text with `script=Latn` vs `script=Arab` produces different glyph IDs where scripts differ
- [x] Add font fallback integration: when a glyph is missing (.notdef), try next font in fallback chain (~60 SLOC)
- **Implemented:** `SwashShaper::shape_with_fallback(&mut self, fonts: &[&[u8]], text, px_size)` detects .notdef runs, re-shapes each with successive fallback fonts, adjusts cluster offsets, and merges results.
- [x] Implement Arabic shaping: ensure proper initial/medial/final/isolated form selection via GSUB (~30 SLOC, mostly backend delegation)
- **Implemented:** `requires_arabic_shaping(text)` in `script_detect.rs`; `shape_request` auto-upgrades Ltr → Rtl when Arabic text is detected so swash applies the correct GSUB lookups. Debug-mode `eprintln!` warns on implicit upgrade.
- [x] Implement Devanagari/Bengali/Tamil/Telugu conjunct handling: reorder marks, apply GSUB lookups (~30 SLOC, backend delegation)
- **Implemented:** `requires_indic_shaping(text)` in `script_detect.rs` covers Devanagari, Bengali, Tamil, Telugu, Kannada blocks. Swash applies GSUB lookups transparently when shaping these scripts.
- [x] Implement Thai/Khmer/Myanmar mark positioning via GPOS (~20 SLOC, backend delegation)
- **Implemented:** `requires_mark_positioning(text)` in `script_detect.rs` covers Thai, Khmer, Myanmar blocks. Swash applies GPOS mark positioning transparently.
- [x] Add vertical text shaping: use `vert`/`vrt2` OpenType features for CJK vertical glyph substitution (~30 SLOC)
- **Goal:** When `direction == Vertical`, automatically enable OpenType `vert` and `vrt2` features in both Swash and Rustybuzz backends to select vertical glyph variants.
- **Files:** `crates/oxitext-shape/src/backend.rs`, `crates/oxitext-shape/src/lib.rs`
- **Tests:** CJK glyph shaped vertically has different GID than horizontally (if font has vert table); advances reflect vmtx
- [x] Add grapheme cluster tracking: annotate each ShapedGlyph with its cluster boundary status (~25 SLOC)
- **Implemented:** `ShapeResult::cluster_boundaries: Vec<usize>` populated in `shape_full()` via `unicode_segmentation::grapheme_indices`.
- [x] Add `unsafe_to_break` flag propagation from swash for correct line-breaking within shaped runs (~15 SLOC)
- **Implemented:** `unsafe_to_break` is now set to `true` when `glyph.info.is_mark()` (mark attachment) or when the glyph is a non-first element of a multi-glyph cluster (ligature / complex script).
- [x] Implement AAT (Apple Advanced Typography) shaping support via swash for macOS system fonts (~50 SLOC)
- **Implemented:** `SwashShaper::font_has_aat(font_data)` checks for `morx`/`kerx`/`ankr` tables via `ttf_parser`. `SwashShaper::shape_with_aat_fallback()` shapes with swash (which already applies AAT transparently) and returns a `ShapeResult` with cluster boundaries.
- [x] Add kashida insertion points for Arabic justification (~30 SLOC)
- [x] Add emoji shaping: ZWJ sequence handling, skin tone modifier support (~25 SLOC)
- [x] Split `lib.rs` to comply with <2000 lines policy — `splitrs` extraction of logical submodules (styled-text path, script-itemisation helpers, cache integration block); all existing tests pass; public API unchanged (planned 2026-05-27)
- **Goal:** `crates/oxitext-shape/src/lib.rs` falls from 2061 to <2000 lines via `splitrs`-driven extraction; public `pub use` re-exports preserved; all 41 shape-crate tests pass; facade tests pass.
- **Design:** Run `splitrs crates/oxitext-shape/src/lib.rs`; accept or tune the extraction plan; extracted modules live in `crates/oxitext-shape/src/` as private `mod` declarations re-exported through `lib.rs`.
- **Files:** `crates/oxitext-shape/src/lib.rs`, `crates/oxitext-shape/src/<new-modules>.rs`
- **Tests:** all existing tests must pass; no new tests required for a pure refactor
## API Improvements
- [x] Add `ShapeRequest` builder: `ShapeRequest::new(text).font(data).size(px).features(&[("liga", true)]).language("ar").direction(Rtl).build()` (~40 SLOC)
- **Goal:** `ShapeRequest::builder().text(t).font(f).size(s).direction(d).feature(b"liga",1).build()` fluent construction of `ShapeRequest`.
- **Files:** `crates/oxitext-shape/src/lib.rs`
- **Tests:** builder with all fields set produces correct ShapeRequest; missing required fields caught at compile time
- [x] Return `ShapeResult` with both glyphs and metadata (script detected, direction resolved, missing codepoints) (~20 SLOC)
- **Implemented:** `ShapeResult { glyphs, script_detected, direction, missing_codepoints }` + `ShapeResult::from_glyphs()` + `SwashShaper::shape_full()`.
- [x] Add `ShapeBackend::supports_script(script: [u8;4]) -> bool` for capability queries
- **Implemented:** Default trait method using `ttf_parser::Face::glyph_index` on sentinel characters per known script (latn/arab/hani/cyrl/grek/hebr/deva/thai). Returns `true` for unknown scripts (permissive default).
- [x] Add `ShapeBackend::shape_with_options(data, text, size, options) -> Vec<ShapedGlyph>` with extended options
- **Implemented:** Default trait method in `backend.rs`; delegates to `shape_with_direction` or `shape_with_features` based on the feature list.
- [x] Make `SwashShaper` accept `&[u8]` in addition to `Arc<Vec<u8>>` to avoid unnecessary allocation
- **Implemented:** `SwashShaper::shape_slice(&[u8], text, px_size)` and `shape_slice_rtl` convenience methods added.
## Testing
- [x] Test Arabic text shaping produces correct glyph forms (initial/medial/final) with rustybuzz backend
- **Goal:** Test Arabic text "مرحبا" shaped with direction=RTL produces glyphs with correct joining forms (initial/medial/final/isolated) — glyph IDs differ from isolated shaping.
- **Files:** `crates/oxitext-shape/src/lib.rs` (inline test)
- [x] Test Devanagari conjuncts (e.g. "ksha" -> single conjunct glyph) with rustybuzz backend
- **Implemented:** `variational::tests::test_devanagari_conjunct_rustybuzz_no_panic` (rustybuzz-backend feature), `test_devanagari_conjunct_swash_no_panic`, `test_devanagari_requires_indic_shaping`, `test_devanagari_virama_text_requires_indic_shaping` in `src/variational.rs`.
- [x] Test CJK text produces correct glyph IDs and advances
- **Goal:** Test that shaping a CJK string produces glyphs with non-zero advance_x and correct cluster assignments.
- **Files:** `crates/oxitext-shape/src/lib.rs` (inline test)
- [x] Test Latin ligatures ("fi", "fl") are applied when `liga` feature is enabled
- [x] Test kerning is applied for common pairs ("AV", "To", "WA")
- **Goal:** Test that shaping "AV" with kern feature enabled produces different total advance than kern disabled (if test font has a kern pair).
- **Files:** `crates/oxitext-shape/src/lib.rs` (inline test)
- [x] Test bidirectional text: Arabic-Latin mixed string produces correctly ordered glyphs
- [x] Test emoji ZWJ sequences: family emoji should produce single glyph (with appropriate font)
- [x] Benchmark swash vs. rustybuzz shaping performance on 10K-character text
- **Implemented:** `bench_tests.rs` contains four timing tests: 10K-char swash shaping, cached vs. uncached comparison, batch vs. individual, and swash vs. rustybuzz (feature-gated). All pass; see eprintln! output with `-nocapture`.
- [x] Test vertical shaping with CJK text and `vert` feature
- **Implemented:** `variational::tests::test_vertical_shaping_applies_vert_feature` uses `ShapeDirection::Ttb` with a real font; `test_vertical_shaping_shape_with_features_vert` exercises the lower-level path; `test_vertical_shaping_vert_and_vrt2_injected` verifies auto-injection logic.
- [x] Test shaping with variable font at different variation coordinates
- **Implemented:** `SwashShaper::shape_with_variations(font_data, text, px_size, variations)` in `src/variational.rs` accepts `&[([u8;4], f32)]` variation axes and delegates to `shape_full`. Tests in `variational::tests` cover empty axes, single axis, and multiple axes.
## Performance
- [x] Cache font parsing per `Arc` pointer identity to avoid re-parsing in `SwashShaperBackend` (already done via Mutex, verify correctness)
- **Verified:** `variational::tests::test_shape_cache_correctness_repeated_font` shapes the same text 5× and asserts glyph-count and glyph-ID stability; `test_shape_cache_correctness_different_texts` verifies interleaved different-text calls do not corrupt subsequent cache lookups.
- [x] Use `RwLock` instead of `Mutex` for `SwashShaperBackend::inner` to allow concurrent reads
- [x] Pre-compute script runs to avoid redundant Unicode property lookups
- **Implemented:** `SwashShaper` now caches `(script_cache_text: String, script_cache_runs: Vec<ScriptRun>)` behind `#[cfg(feature = "icu")]`. `shape_by_script` skips re-calling `CharProperties::itemize` when the text is unchanged. Test: `tests::test_script_cache_reuses_on_same_text`.
- [x] Batch multiple text segments into a single shaping context for amortized setup cost
- **Implemented:** `SwashShaper::shape_batch(font_data, segments, px_size)` and `shape_batch_directed(font_data, segments_with_dir, px_size)` in `src/batch.rs`. Also added `shape_batch_with_features`. All share the same `ShapeContext` across calls. Tests in `batch::tests`.
- [x] Fix `SwashShaperBackend::shape` to preserve `Arc<[u8]>` pointer identity for ShapeCache hits — change `ShapeBackend` trait to take `&Arc<[u8]>` instead of `&[u8]`; update all impls; eliminate the `Arc::from(face_data)` per-call allocation that defeats cache (done 2026-05-27)
- **Goal:** `ShapeCache` keyed on `Arc::as_ptr(&font_data) as usize` actually hits on repeated calls with the same font; the `SwashShaperBackend` no longer silently defeats it.
- **Design:** Changed `ShapeBackend::shape` (and all other trait methods) signature to `&Arc<[u8]>`; `SwashShaperBackend::shape` now calls `Arc::clone(face_data)` directly; all callers in `oxitext` already held `Arc<[u8]>` so no new allocations. Updated `RustybuzzShaper`, `NoopShaper` test impls, and `alt_backend.rs` integration tests.
- **Files:** `crates/oxitext-shape/src/backend.rs`, `crates/oxitext-shape/tests/alt_backend.rs`, `crates/oxitext/tests/layout_api.rs`
- **Tests:** `swash_backend_arc_pointer_preserved`, `swash_backend_different_arcs_distinct_keys`
## Integration
- [x] Receive font data from oxifont-db/oxifont-adapter-pure for automatic font selection
- **Implemented:** `system_fonts` module (feature `system-fonts`) provides `load_font_for_family`, `load_best_font_for_text`, `build_system_db`, and `*_from` variants. `SwashShaper::shape_with_system_font` and `SwashShaper::shape_with_family` use these to discover and load fonts automatically.
- [x] Feed shaped runs to oxitext-layout for paragraph layout with correct bidi reordering
- [x] Propagate `unsafe_to_break` to oxitext-layout's line breaker for cluster-aware wrapping
- **Done (partial):** `unsafe_to_break` is now correctly set from swash's mark-attachment and multi-glyph-cluster signals; the field flows through `ShapedGlyph` to whoever consumes it.
- [x] Coordinate with oxitext-icu for CLDR-based script detection and locale-aware shaping
— `icu` feature enables CLDR line-breaking and script itemization in the facade; `SwashShaper` script cache uses `CharProperties::itemize` from `oxitext-icu`