Skip to main content

Module script_segmentation

Module script_segmentation 

Source
Expand description

Script segmentation and bidi-safe text run partitioning.

This module provides deterministic text-run segmentation by Unicode script, bidi direction, and style — preparing robust shaping inputs and consistent cache keys for the downstream HarfBuzz shaping pipeline.

§Design

Text shaping engines (HarfBuzz, CoreText, DirectWrite) require input to be split into runs that share the same script, direction, and style. Mixing scripts in a single shaping call produces incorrect glyph selection and positioning.

This module implements a three-phase algorithm:

  1. Raw classification — assign each character its Unicode script via block-range lookup (char_script).
  2. Common/Inherited resolution — resolve Common and Inherited characters by propagating adjacent specific scripts (UAX#24-inspired).
  3. Run grouping — collect contiguous characters sharing the same resolved script into ScriptRun spans.

The TextRun type further subdivides by direction and style, producing the atomic units suitable for shaping. RunCacheKey provides a deterministic, hashable identifier for caching shaped glyph output.

§Example

use ftui_text::script_segmentation::{Script, ScriptRun, partition_by_script};

let runs = partition_by_script("Hello مرحبا World");
assert!(runs.len() >= 2); // At least Latin and Arabic runs
assert_eq!(runs[0].script, Script::Latin);

Structs§

RunCacheKey
Deterministic, hashable cache key for shaped glyph output.
ScriptRun
A contiguous run of characters sharing the same resolved script.
TextRun
A fully partitioned text run suitable for shaping.

Enums§

RunDirection
A text direction for run partitioning.
Script
Unicode script classification for shaping.

Functions§

char_script
Classify a character’s Unicode script via block-range lookup.
partition_by_script
Partition text into contiguous runs of the same Unicode script.
partition_text_runs
Partition text into fully-resolved text runs by script and direction.