Expand description
Collation encoding support for SQL Server VARCHAR decoding.
This module provides mappings from SQL Server collation LCIDs (Locale IDs) to their corresponding character encodings, enabling proper decoding of non-UTF-8 VARCHAR data.
§Supported Encodings
The following encoding families are supported based on the collation’s LCID:
| Code Page | Encoding | Languages |
|---|---|---|
| 874 | Windows-874 (TIS-620) | Thai |
| 932 | Shift_JIS | Japanese |
| 936 | GBK/GB18030 | Simplified Chinese |
| 949 | EUC-KR | Korean |
| 950 | Big5 | Traditional Chinese |
| 1250 | Windows-1250 | Central/Eastern European |
| 1251 | Windows-1251 | Cyrillic |
| 1252 | Windows-1252 | Western European (default) |
| 1253 | Windows-1253 | Greek |
| 1254 | Windows-1254 | Turkish |
| 1255 | Windows-1255 | Hebrew |
| 1256 | Windows-1256 | Arabic |
| 1257 | Windows-1257 | Baltic |
| 1258 | Windows-1258 | Vietnamese |
§UTF-8 Collations
SQL Server 2019+ supports UTF-8 collations (suffix _UTF8). These are
detected by checking the collation flags. When a UTF-8 collation is used,
no encoding conversion is needed as the data is already UTF-8.
§References
Constants§
- COLLATION_
FLAG_ UTF8 - Flag bit indicating UTF-8 collation (SQL Server 2019+). This is bit 27 (0x0800_0000) in the collation info field.
- LCID_
MASK - Mask to extract the primary LCID from the collation info. The LCID is stored in the lower 20 bits.
- PRIMARY_
LANGUAGE_ MASK - Mask to extract the primary language ID (lower 16 bits of LCID).
Functions§
- code_
page_ for_ lcid - Returns the Windows code page number for a given LCID.
- encoding_
for_ lcid - Returns the encoding for a given LCID, if known.
- encoding_
name_ for_ lcid - Returns the encoding name for display/logging purposes.
- is_
utf8_ collation - Returns whether the collation uses UTF-8 encoding.