Expand description
UCS-2 encoding and decoding.
§Encoding
A unicode code point is represented using two bytes in UCS-2, using always this fixed size.
§Decoding
A UCS-2 code point is decoded into a unicode code point using the the first two bytes.
§Representation
Note:
- UCS-2 is a subset of UTF-16.
- UCS-2 is capable of ending 65,536 code points. This is the same as the first 65,536 code points of UTF-16.
§Two bytes
Encoding: If the unicode code point is less than 0xFFFF, the unicode code point is represented in UTF-16 using only the 16 least significant bits.
Decoding: If the UTF-16 code point is less than 0xD800 or greater than 0xDBFF and less than 0xFFFF, the unicode code point is represented using only the 16 least significant bits.
- Unicode code point:
nnnnnnnn|nnnnnnnn|xxxxxxxx|xxxxxxxx
- UTF-16 code point:
xxxxxxxx|xxxxxxxx
Functions§
- Decode a vector of UCS-2 code points into a vector of unicode code points.
- Encode a vector of unicode code points into a vector of UCS-2 code points.
- Pretty print the UCS-2 encoding in hexadecimal and decimal of a vector of UCS-2 code points.
- Pretty print the UCS-2 encoding in hexadecimal and decimal of a vector of UCS-2 code points.