Expand description
A lightweight text encoding converter based on platform native APIs or libiconv.
§Usage
Use convert
or convert_lossy
to convert text between encodings.
use iconv_native::convert;
let output = convert(b"\x82\xb3\x83\x86\x82\xe8", "Shift_JIS", "UTF-16LE")?;
assert_eq!(output, b"\x55\x30\xe6\x30\x8a\x30");
If target encoding is UTF-8 and the output is going to be treated as a Rust-native String
, use decode
or decode_lossy
instead.
use iconv_native::decode;
let output = decode(b"\xa4\xaa\xa4\xe4\xa4\xb9\xa4\xdf", "GB18030")?;
assert_eq!(output, "おやすみ");
There are some minor differences between these functions specifically for BOM handling. See the documentation of each function for more details.
§Platforms
§Windows
By default this crate uses MultiByteToWideChar
and WideCharToMultiByte
functions, controlled by feature win32
. Since UTF-32 is not supported by these functions, widestring
crate is used to convert UTF-32 to UTF-16 and vice versa.
You may also disable default features and enable libiconv
to use the libiconv library instead.
§Linux
On Linux with glibc, the built-in iconv
is used by default, controlled by feature libc-iconv
. You may also disable default features and enable libiconv
to use the libiconv library instead.
Other libcs may not have an iconv
implementation that is compatible with glibc’s (specifically the //IGNORE
and //TRANSLIT
extensions and proper BOM handling), hence libc-iconv
feature does not apply to them. By default, fallback-libiconv
feature applies and will link to the libiconv library. Make sure to have libiconv installed on user’s system.
§macOS
Same as Linux with glibc. You may also disable default features and enable libiconv
to use the libiconv library instead.
§Web (WASM)
Uses TextDecoder
and TextEncoder
Web APIs. widestring
crate is used to handle UTF-16 and UTF-32 related conversions.
As per Encoding Standard, a standard-compliant browser supports only UTF-8 when using TextEncoder
, hence conversions to any encodings other than UTF-8/UTF-16/UTF-32 (including LE/BE variants) are not supported and will result in an UnknownConversion
error.
Consider import a polyfill and enable wasm-nonstandard-allow-legacy-encoding
feature if full encoding support is required, in which case
most of the encodings will work. However, there is no guarantee as it is not a standard-compliant behavior.
Conversions from legacy encodings are not affected by this limitation. See Encoding Standard for more details.
§Other
On other platforms, the libiconv library is used by default, controlled by feature fallback-libiconv
.
§Feature flags
The following table summarizes the feature flags used to control the underlying implementation iconv-native
uses on different platforms.
Feature | Windows | GNU/Linux or GNU/Hurd, with glibc | macOS | Web (WASM) | Other |
---|---|---|---|---|---|
win32 (default) | ✅ | ||||
libc-iconv (default) | ✅ | ✅ | |||
web-encoding (default) | ✅ | ||||
libiconv | ✅ | ✅ | ✅ | ❓ | ✅ |
fallback-libiconv (default) | 🉑 | 🉑 | 🉑 | ❓ | 🉑 |
- ✅: The corresponding implementation will take effect on the platform. For each platform, there can be at most one ✅ feature enabled.
- 🉑: The corresponding implementation will not take effect unless no ✅ feature is enabled on the platform.
- ❓: The corresponding implementation’s applicability is not known on the platform.
The following optional feature flags can be used to control the behavior of certain implementations:
wasm-nonstandard-allow-legacy-encoding
: Enable this feature to allow legacy encodings other than UTF-8/UTF-16/UTF-32 (including LE/BE variants) on Web (WASM) platform. A polyfill is required for it to work.
Enums§
- Convert
Error - Error representation for
decode
andconvert
. - Convert
Lossy Error - Error representation for
decode_lossy
andconvert_lossy
.
Functions§
- convert
- Converts a byte sequence of
from_encoding
encoded text toto_encoding
. - convert_
lossy - Converts a byte sequence of
from_encoding
encoded text toto_encoding
. Possibly includes invalid sequences or unrepresentable characters. - decode
- Converts text represented by a slice of bytes of a specified encoding to a
String
. - decode_
lossy - Converts text represented by a slice of bytes of a specified encoding to a
String
. Possibly includes invalid sequences.