Module unicode

Source

Constants§

BEL
ASCII BEL.
BOM
ZERO WIDTH NO-BREAK SPACE, also known as the byte-order mark, or BOM
CAN
ASCII CAN.
CGJ
COMBINING GRAPHEME JOINER
DEL
ASCII DEL, which is not what’s generated by the “delete” key on the keyboard
ESC
ASCII ESC, known as ‘\e’ in some contexts.
FF
ASCII FF, known as ‘\f’ in some contexts.
LS
LINE SEPARATOR
MAX_UTF8_SIZE
The size of the longest UTF-8 scalar value encoding. Note that even though RFC-2279 allowed longer encodings, it’s obsoleted by RFC-3629 which doesn’t. This limit is also documented in the relevant section of Rust’s documentation.
NEL
EBCDIC NEXT LINE, which is treated like generic whitespace.
NORMALIZATION_BUFFER_LEN
NORMALIZATION_BUFFER_SIZE
The minimum size of a buffer needed to perform NFC normalization, and thus the minimum size needed to pass to TextReader’s read.
ORC
OBJECT REPLACEMENT CHARACTER
PS
PARAGRAPH SEPARATOR
REPL
REPLACEMENT CHARACTER
SUB
ASCII SUB.
WJ
WORD JOINER
ZWJ
ZERO WIDTH JOINER

Functions§

is_normalization_form_starter