Module ende::ucs2

source ·
Expand description

UCS-2 encoding and decoding.

§Encoding

A unicode code point is represented using two bytes in UCS-2, using always this fixed size.

§Decoding

A UCS-2 code point is decoded into a unicode code point using the the first two bytes.

§Representation

Note:

  • UCS-2 is a subset of UTF-16.
  • UCS-2 is capable of ending 65,536 code points. This is the same as the first 65,536 code points of UTF-16.

§Two bytes

Encoding: If the unicode code point is less than 0xFFFF, the unicode code point is represented in UTF-16 using only the 16 least significant bits.

Decoding: If the UTF-16 code point is less than 0xD800 or greater than 0xDBFF and less than 0xFFFF, the unicode code point is represented using only the 16 least significant bits.

  • Unicode code point: nnnnnnnn|nnnnnnnn|xxxxxxxx|xxxxxxxx
  • UTF-16 code point: xxxxxxxx|xxxxxxxx

Functions§

  • Decode a vector of UCS-2 code points into a vector of unicode code points.
  • Encode a vector of unicode code points into a vector of UCS-2 code points.
  • Pretty print the UCS-2 encoding in hexadecimal and decimal of a vector of UCS-2 code points.
  • Pretty print the UCS-2 encoding in hexadecimal and decimal of a vector of UCS-2 code points.