[−][src]Crate encoding_rs_io
This crate provides streaming transcoding by implementing Rust's I/O traits
and delegating transcoding to the
encoding_rs
crate.
Currently, this crate only provides a means of transcoding from a source
encoding (that is among the encodings supported by encoding_rs
) to UTF-8 via
an implementation of std::io::Read
, where errors are handled by replacing
invalid sequences with the Unicode replacement character. Future work may
provide additional implementations for std::io::Write
and/or implementations
that make stronger guarantees about UTF-8 validity.
Example
This example shows how to create a decoder that transcodes UTF-16LE (the source) to UTF-8 (the destination).
extern crate encoding_rs; extern crate encoding_rs_io; use std::error::Error; use std::io::Read; use encoding_rs_io::DecodeReaderBytes; fn example() -> Result<(), Box<Error>> { let source_data = &b"\xFF\xFEf\x00o\x00o\x00b\x00a\x00r\x00"[..]; // N.B. `source_data` can be any arbitrary io::Read implementation. let mut decoder = DecodeReaderBytes::new(source_data); let mut dest = String::new(); // decoder implements the io::Read trait, so it can easily be plugged // into any consumer expecting an arbitrary reader. decoder.read_to_string(&mut dest)?; assert_eq!(dest, "foobar"); Ok(()) }
Future work
Currently, this crate only provides a way to get possibly valid UTF-8 from some source encoding. There are other transformations that may be useful that we could include in this crate. Namely:
- An encoder that accepts an arbitrary
std::io::Write
implementation and takes valid UTF-8 and transcodes it to a selected destination encoding. This encoder would implementstd::fmt::Write
. - A decoder that accepts an arbitrary
std::fmt::Write
implementation and takes arbitrary bytes and transcodes them from a selected source encoding to valid UTF-8. This decoder would implementstd::io::Write
. - An encoder that accepts an arbitrary
UnicodeRead
implementation and takes valid UTF-8 and transcodes it to a selected destination encoding. This encoder would implementstd::io::Read
. - A decoder that accepts an arbitrary
std::io::Read
implementation and takes arbitrary bytes and transcodes them from a selected source encoding to valid UTF-8. This decoder would implement theUnicodeRead
trait.
Where UnicodeRead
is a hypothetical trait that does not yet exist. Its
definition might look something like this:
trait UnicodeRead { fn read(&mut self, buf: &mut str) -> Result<usize>; }
Interestingly, of the above transformations, none of them correspond to
DecodeReaderBytes
. Namely, DecodeReaderBytes
most closely corresponds to
the last option, but instead of guaranteeing valid UTF-8 by implementing a
trait like UnicodeRead
, it instead implements std::io::Read
, which pushes
UTF-8 handling on to the caller. However, it turns out that this particular
use case is important for operations like search, which can often be written
in a way that don't assume UTF-8 validity but still benefit from it.
It's not clear which of the above transformations is actually useful, but all of them could theoretically exist. There is more discussion on this topic here (and in particular, the above formulation was taken almost verbatim from Simon Sapin's comments): https://github.com/hsivonen/encoding_rs/issues/8
It is also perhaps worth stating that this crate very much intends on
remaining coupled to encoding_rs
, which helps restrict the scope, but may be
too biased toward Web oriented encoding to solve grander encoding challenges.
As such, it may very well be that this crate is actually a stepping stone to
something with a larger scope. But first, we must learn.
Structs
DecodeReaderBytes | An implementation of |
DecodeReaderBytesBuilder | A builder for constructing a byte oriented transcoder to UTF-8. |