Crate iconv_native

Source
Expand description

A lightweight text encoding converter based on platform native APIs or libiconv.

§Usage

Use convert or convert_lossy to convert text between encodings.

use iconv_native::convert;

let output = convert(b"\x82\xb3\x83\x86\x82\xe8", "Shift_JIS", "UTF-16LE")?;
assert_eq!(output, b"\x55\x30\xe6\x30\x8a\x30");

If target encoding is UTF-8 and the output is going to be treated as a Rust-native String, use decode or decode_lossy instead.

use iconv_native::decode;

let output = decode(b"\xa4\xaa\xa4\xe4\xa4\xb9\xa4\xdf", "GB18030")?;
assert_eq!(output, "おやすみ");

There are some minor differences between these functions specifically for BOM handling. See the documentation of each function for more details.

§Platforms

§Windows

By default this crate uses MultiByteToWideChar and WideCharToMultiByte functions, controlled by feature win32. Since UTF-32 is not supported by these functions, widestring crate is used to convert UTF-32 to UTF-16 and vice versa.

You may also disable default features and enable libiconv to use the libiconv library instead.

§Linux

On Linux with glibc, the built-in iconv is used by default, controlled by feature libc-iconv. You may also disable default features and enable libiconv to use the libiconv library instead.

Other libcs may not have an iconv implementation that is compatible with glibc’s (specifically the //IGNORE and //TRANSLIT extensions and proper BOM handling), hence libc-iconv feature does not apply to them. By default, fallback-libiconv feature applies and will link to the libiconv library. Make sure to have libiconv installed on user’s system.

§macOS

Same as Linux with glibc. You may also disable default features and enable libiconv to use the libiconv library instead.

§Web (WASM)

Uses TextDecoder and TextEncoder Web APIs. widestring crate is used to handle UTF-16 and UTF-32 related conversions.

As per Encoding Standard, a standard-compliant browser supports only UTF-8 when using TextEncoder, hence conversions to any encodings other than UTF-8/UTF-16/UTF-32 (including LE/BE variants) are not supported and will result in an UnknownConversion error. Consider import a polyfill and enable wasm-nonstandard-allow-legacy-encoding feature if full encoding support is required, in which case most of the encodings will work. However, there is no guarantee as it is not a standard-compliant behavior.

Conversions from legacy encodings are not affected by this limitation. See Encoding Standard for more details.

§Other

On other platforms, the libiconv library is used by default, controlled by feature fallback-libiconv.

§Feature flags

The following table summarizes the feature flags used to control the underlying implementation iconv-native uses on different platforms.

FeatureWindowsGNU/Linux or GNU/Hurd, with glibcmacOSWeb (WASM)Other
win32 (default)
libc-iconv (default)
web-encoding (default)
libiconv
fallback-libiconv (default)🉑🉑🉑🉑
  • ✅: The corresponding implementation will take effect on the platform. For each platform, there can be at most one ✅ feature enabled.
  • 🉑: The corresponding implementation will not take effect unless no ✅ feature is enabled on the platform.
  • ❓: The corresponding implementation’s applicability is not known on the platform.

The following optional feature flags can be used to control the behavior of certain implementations:

  • wasm-nonstandard-allow-legacy-encoding: Enable this feature to allow legacy encodings other than UTF-8/UTF-16/UTF-32 (including LE/BE variants) on Web (WASM) platform. A polyfill is required for it to work.

Enums§

ConvertError
Error representation for decode and convert.
ConvertLossyError
Error representation for decode_lossy and convert_lossy.

Functions§

convert
Converts a byte sequence of from_encoding encoded text to to_encoding.
convert_lossy
Converts a byte sequence of from_encoding encoded text to to_encoding. Possibly includes invalid sequences or unrepresentable characters.
decode
Converts text represented by a slice of bytes of a specified encoding to a String.
decode_lossy
Converts text represented by a slice of bytes of a specified encoding to a String. Possibly includes invalid sequences.