Module collation

Module collation 

Source
Expand description

Collation encoding support for SQL Server VARCHAR decoding.

This module provides mappings from SQL Server collation LCIDs (Locale IDs) to their corresponding character encodings, enabling proper decoding of non-UTF-8 VARCHAR data.

§Supported Encodings

The following encoding families are supported based on the collation’s LCID:

Code PageEncodingLanguages
874Windows-874 (TIS-620)Thai
932Shift_JISJapanese
936GBK/GB18030Simplified Chinese
949EUC-KRKorean
950Big5Traditional Chinese
1250Windows-1250Central/Eastern European
1251Windows-1251Cyrillic
1252Windows-1252Western European (default)
1253Windows-1253Greek
1254Windows-1254Turkish
1255Windows-1255Hebrew
1256Windows-1256Arabic
1257Windows-1257Baltic
1258Windows-1258Vietnamese

§UTF-8 Collations

SQL Server 2019+ supports UTF-8 collations (suffix _UTF8). These are detected by checking the collation flags. When a UTF-8 collation is used, no encoding conversion is needed as the data is already UTF-8.

§References

Constants§

COLLATION_FLAG_UTF8
Flag bit indicating UTF-8 collation (SQL Server 2019+). This is bit 27 (0x0800_0000) in the collation info field.
LCID_MASK
Mask to extract the primary LCID from the collation info. The LCID is stored in the lower 20 bits.
PRIMARY_LANGUAGE_MASK
Mask to extract the primary language ID (lower 16 bits of LCID).

Functions§

code_page_for_lcid
Returns the Windows code page number for a given LCID.
encoding_for_lcid
Returns the encoding for a given LCID, if known.
encoding_name_for_lcid
Returns the encoding name for display/logging purposes.
is_utf8_collation
Returns whether the collation uses UTF-8 encoding.