Skip to main content

Module string_width

Module string_width 

Source
Expand description

Port of string-width@8 to Rust.

Computes the visual column width of a string as rendered by a terminal, matching the semantics of string-width@8.2.1 exactly. This is the width measure function fed to Taffy’s layout engine. The spec is the JS source (string-width/index.js, ansi-regex/index.js, get-east-asian-width/); where this comment and the source disagree, the source wins.

§Ported algorithm

ANSI escapes are stripped first (unless count_ansi_escape_codes) via a faithful port of [ansi-regex@6.2.2] — the regex strip-ansi@7 (string-width’s dependency) delegates to. Then each [Intl.Segmenter] grapheme cluster (here: unicode-segmentation, empirically identical on the suspect classes) is measured by, in order:

  1. Zero-width cluster — every char is Default_Ignorable | Control | Format | Mark | Surrogate (Surrogate is unreachable inside a Rust &str, which holds only scalar values). Tabs are Control → width 0.
  2. Emoji width 2^\p{RGI_Emoji}$ (regex v-flag) OR isDoubleWidthNonRgiEmojiSequence. \p{RGI_Emoji} has no Rust crate; it is approximated by [is_double_width_emoji]’s rule-set (keycap, valid RGI flag pair, ZWJ with ≥2 Extended_Pictographic, VS16-on-pictographic, modifier-on-base). See that function for each rule’s JS anchor.
  3. Hangul jamo — modern L+V(+T) syllable blocks collapse to width 2; unmatched jamo stay additive (hangul_cluster_width, ported exactly).
  4. East Asian WidtheastAsianWidth of the first visible scalar, plus each trailing Halfwidth/Fullwidth Forms char (U+FF00–U+FFEF) by its own EAW (trailing_halfwidth_width).

§Approximation boundary

The only approximation is \p{RGI_Emoji} (replaced by [is_double_width_emoji]). Any RGI sequence the rule-set fails to classify as width 2 would diverge from Node; a ≥3000-case differential fuzz against Node string-width@8.2.1 (every RGI class, Indic, Hangul, prepend, HW/FW, combining, tabs/controls, ANSI, and random multi-class concatenations) found zero divergences. Every property and EAW range table is Node-derived (Node 24 / Unicode 16) with a provenance comment and regen recipe, mirroring slice_ansi/tokenize_ansi.rs.

§Options

ambiguous_is_narrow (default true): East Asian Ambiguous chars are narrow (1) unless set to false (CJK context → 2). count_ansi_escape_codes (default false): count escape bytes instead of stripping them.

Structs§

Options
Options for string_width.

Functions§

string_width
Returns the visual column width of input as rendered by a monospace terminal.
string_width_with
Returns the visual column width of input using the given options.