Create a new decoder builder with a default configuration.
By default, no explicit encoding is used, but if a UTF-8 or UTF-16
BOM is detected, then an appropriate encoding is automatically
detected and transcoding is performed (where invalid sequences map to
the Unicode replacement codepoint).
Build a new decoder that wraps the given reader.
Build a new decoder that wraps the given reader and uses the given
buffer internally for transcoding.
This is useful for cases where it is advantageuous to amortize
allocation. Namely, this method permits reusing a buffer for
subsequent decoders.
This returns an error if the buffer is smaller than 4 bytes (which is
too small to hold maximum size of a single UTF-8 encoded codepoint).
Set an explicit encoding to be used by this decoder.
When an explicit encoding is set, BOM sniffing is disabled and the
encoding provided will be used unconditionally. Errors in the encoded
bytes are replaced by the Unicode replacement codepoint.
By default, no explicit encoding is set.
Enable UTF-8 passthru, even when a UTF-8 BOM is observed.
When an explicit encoding is not set (thereby invoking automatic
encoding detection via BOM sniffing), then a UTF-8 BOM will cause
UTF-8 transcoding to occur. In particular, if the source contains
invalid UTF-8 sequences, then they are replaced with the Unicode
replacement codepoint.
This transcoding may not be desirable. For example, the caller may
already have its own UTF-8 handling where invalid UTF-8 is
appropriately handled, in which case, doing an extra transcoding
step is extra and unnecessary work. Enabling this option will prevent
that extra transcoding step from occurring. In this case, the bytes
emitted by the reader are passed through unchanged (including the BOM)
and the caller will be responsible for handling any invalid UTF-8.
This example demonstrates the effect of enabling this option on data
that includes a UTF-8 BOM but also, interestingly enough, subsequently
includes invalid UTF-8.
extern crate encoding_rs;
extern crate encoding_rs_io;
use std::error::Error;
use std::io::Read;
use encoding_rs_io::DecodeReaderBytesBuilder;
fn example() -> Result<(), Box<Error>> {
let source_data = &b"\xEF\xBB\xBFfoo\xFFbar"[..];
let mut decoder = DecodeReaderBytesBuilder::new()
.utf8_passthru(true)
.build(source_data);
let mut dest = vec![];
decoder.read_to_end(&mut dest)?;
assert_eq!(dest, b"\xEF\xBB\xBFfoo\xFFbar");
Ok(())
}
Whether or not to always strip a BOM if one is found.
When this is enabled, if a BOM is found at the beginning of a stream,
then it is ignored. This applies even when utf8_passthru
is enabled.
This is disabled by default.
This example shows how to remove the BOM if it's present even when
utf8_passthru
is enabled.
extern crate encoding_rs;
extern crate encoding_rs_io;
use std::error::Error;
use std::io::Read;
use encoding_rs_io::DecodeReaderBytesBuilder;
fn example() -> Result<(), Box<Error>> {
let source_data = &b"\xEF\xBB\xBFfoo\xFFbar"[..];
let mut decoder = DecodeReaderBytesBuilder::new()
.utf8_passthru(true)
.strip_bom(true)
.build(source_data);
let mut dest = vec![];
decoder.read_to_end(&mut dest)?;
assert_eq!(dest, b"foo\xFFbar");
Ok(())
}
Give the highest precedent to the BOM, if one is found.
When this is enabled, and if a BOM is found, then the encoding
indicated by that BOM is used even if an explicit encoding has been
set via the encoding
method.
This does not override utf8_passthru
.
This is disabled by default.