Crate unicode_width

source ·
Expand description

Determine displayed width of char and str types according to Unicode Standard Annex #11 and other portions of the Unicode standard. See the Rules for determining width section for the exact rules.

This crate is #![no_std].

use unicode_width::UnicodeWidthStr;

let teststr = "Hello, world!";
let width = UnicodeWidthStr::width(teststr);
println!("{}", teststr);
println!("The above string is {} columns wide.", width);
let width = teststr.width_cjk();
println!("The above string is {} columns wide (CJK).", width);

§Rules for determining width

This crate currently uses the following rules to determine the width of a character or string, in order of decreasing precedence. These may be tweaked in the future.

  1. Emoji presentation sequences have width 2.
  2. Outside of an East Asian context, text presentation sequences have width 1 if their base character:
  3. The sequence "\r\n" has width 1.
  4. Lisu tone letter combinations consisting of a character in the range '\u{A4F8}'..='\u{A4FB}' followed by a character in the range '\u{A4FC}'..='\u{A4FD}' have width 1.
  5. In an East Asian context only, <, =, or > have width 2 when followed by '\u{0338}' COMBINING LONG SOLIDUS OVERLAY.
  6. '\u{115F}' HANGUL CHOSEONG FILLER has width 2.
  7. The following have width 0:
  8. Characters with an East_Asian_Width of Fullwidth or Wide have width 2.
  9. Characters fulfilling all of the following conditions have width 2 in an East Asian context, and width 1 otherwise:
  10. All other characters have width 1.

§Canonical equivalence

Canonically equivalent strings are assigned the same width (CJK and non-CJK).

Constants§

Traits§