Expand description
Port of string-width@8 to Rust.
Computes the visual column width of a string as rendered by a terminal,
matching the semantics of string-width@8.2.1 exactly. This is the width
measure function fed to Taffy’s layout engine. The spec is the JS source
(string-width/index.js, ansi-regex/index.js, get-east-asian-width/);
where this comment and the source disagree, the source wins.
§Ported algorithm
ANSI escapes are stripped first (unless count_ansi_escape_codes) via a
faithful port of [ansi-regex@6.2.2] — the regex strip-ansi@7
(string-width’s dependency) delegates to. Then each [Intl.Segmenter]
grapheme cluster (here: unicode-segmentation, empirically identical on the
suspect classes) is measured by, in order:
- Zero-width cluster — every char is
Default_Ignorable | Control | Format | Mark | Surrogate(Surrogateis unreachable inside a Rust&str, which holds only scalar values). Tabs areControl→ width 0. - Emoji width 2 —
^\p{RGI_Emoji}$(regex v-flag) ORisDoubleWidthNonRgiEmojiSequence.\p{RGI_Emoji}has no Rust crate; it is approximated by [is_double_width_emoji]’s rule-set (keycap, valid RGI flag pair, ZWJ with ≥2 Extended_Pictographic, VS16-on-pictographic, modifier-on-base). See that function for each rule’s JS anchor. - Hangul jamo — modern L+V(+T) syllable blocks collapse to width 2;
unmatched jamo stay additive (
hangul_cluster_width, ported exactly). - East Asian Width —
eastAsianWidthof the first visible scalar, plus each trailing Halfwidth/Fullwidth Forms char (U+FF00–U+FFEF) by its own EAW (trailing_halfwidth_width).
§Approximation boundary
The only approximation is \p{RGI_Emoji} (replaced by [is_double_width_emoji]).
Any RGI sequence the rule-set fails to classify as width 2 would diverge from
Node; a ≥3000-case differential fuzz against Node string-width@8.2.1 (every
RGI class, Indic, Hangul, prepend, HW/FW, combining, tabs/controls, ANSI,
and random multi-class concatenations) found zero divergences. Every
property and EAW range table is Node-derived (Node 24 / Unicode 16) with a
provenance comment and regen recipe, mirroring slice_ansi/tokenize_ansi.rs.
§Options
ambiguous_is_narrow (default true): East Asian Ambiguous chars are narrow
(1) unless set to false (CJK context → 2). count_ansi_escape_codes
(default false): count escape bytes instead of stripping them.
Structs§
- Options
- Options for
string_width.
Functions§
- string_
width - Returns the visual column width of
inputas rendered by a monospace terminal. - string_
width_ with - Returns the visual column width of
inputusing the given options.