Crate encoding_rs_rw
source ·Expand description
Space-efficient std::io::{Read, Write} wrappers for encoding_rs
This crate provides std::io::Read
and std::io::Write
implementations for
encoding_rs::Decoder
and encoding_rs::Encoder
, respectively, to support
Rust’s standard streaming API.
use std::{fs, io, io::prelude::*};
use encoding_rs::{EUC_JP, SHIFT_JIS};
use encoding_rs_rw::{DecodingReader, EncodingWriter};
let file_r = io::BufReader::new(fs::File::open("foo.txt")?);
let mut reader = DecodingReader::new(file_r, EUC_JP.new_decoder());
let mut utf8 = String::new();
reader.read_to_string(&mut utf8)?;
let file_w = fs::File::create("bar.txt")?;
let mut writer = EncodingWriter::new(file_w, SHIFT_JIS.new_encoder());
write!(writer, "{}", utf8)?;
writer.flush()?;
This crate is an alternative to encoding_rs_io
but provides a simpler API
and more flexible error semantics.
This crate also provides a lossy
variant of the decoding reader that replaces
malformed byte sequences with replacement characters (U+FFED) and a
with_unmappable_handler
variant of writer that handles unmappable characters
with the specified handler.
use std::{fs, io, io::prelude::*};
use encoding_rs::{EUC_KR, ISO_8859_7};
use encoding_rs_rw::{DecodingReader, EncodingWriter};
let file_r = io::BufReader::new(fs::File::open("baz.txt")?);
let mut reader = DecodingReader::new(file_r, EUC_KR.new_decoder());
let mut utf8 = String::new();
reader.lossy().read_to_string(&mut utf8)?;
let file_w = fs::File::create("qux.txt")?;
let mut writer = EncodingWriter::new(file_w, ISO_8859_7.new_encoder());
{
let mut writer =
writer.with_unmappable_handler(|e, w| write!(w, "&#{};", u32::from(e.value())));
write!(writer, "{}", utf8)?;
writer.flush()?;
}
Design
Conversion between different character encodings essentially requires byte
buffers before and after the converter to implement Rust’s Read
and Write
traits because, whereas read
and write
must support byte-by-byte operations,
character encoders and decoders consume and produce multiple bytes at a time to
handle multi-byte characters. The types in this crate employ small buffers to
operate byte-by-byte, but it bypasses the internal buffers and utilizes the
supplied buffers as much as possible to minimize double-buffering and memory
consumption.
Modules
- Miscellaneous types not intended for direct access by name.
Structs
- A reader wrapper that decodes an input byte stream into UTF-8.
- A writer wrapper that encodes an input byte stream into the specified encoding.
- The error type reported by
DecodingReader
andEncodingWriter
when they encounter a malformed byte sequence. - The error type reported by
EncodingWriter
when it encounters an unmappable character.