Crate encoding_rs_rw

Source
Expand description

Space-efficient std::io::{Read, Write} wrappers for encoding_rs

This crate provides std::io::Read and std::io::Write implementations for encoding_rs::Decoder and encoding_rs::Encoder, respectively, to support Rust’s standard streaming API.

use std::{fs, io, io::prelude::*};

use encoding_rs::{EUC_JP, SHIFT_JIS};
use encoding_rs_rw::{DecodingReader, EncodingWriter};

let file_r = io::BufReader::new(fs::File::open("foo.txt")?);
let mut reader = DecodingReader::new(file_r, EUC_JP.new_decoder());
let mut utf8 = String::new();
reader.read_to_string(&mut utf8)?;

let file_w = fs::File::create("bar.txt")?;
let mut writer = EncodingWriter::new(file_w, SHIFT_JIS.new_encoder());
write!(writer, "{}", utf8)?;
writer.flush()?;

This crate is an alternative to encoding_rs_io but provides a simpler API and more flexible error semantics.

This crate also provides a lossy variant of the decoding reader that replaces malformed byte sequences with replacement characters (U+FFED) and a with_unmappable_handler variant of writer that handles unmappable characters with the specified handler.

use std::{fs, io, io::prelude::*};

use encoding_rs::{EUC_KR, ISO_8859_7};
use encoding_rs_rw::{DecodingReader, EncodingWriter};

let file_r = io::BufReader::new(fs::File::open("baz.txt")?);
let mut reader = DecodingReader::new(file_r, EUC_KR.new_decoder());
let mut utf8 = String::new();
reader.lossy().read_to_string(&mut utf8)?;

let file_w = fs::File::create("qux.txt")?;
let mut writer = EncodingWriter::new(file_w, ISO_8859_7.new_encoder());
{
    let mut writer =
        writer.with_unmappable_handler(|e, w| write!(w, "&#{};", u32::from(e.value())));
    write!(writer, "{}", utf8)?;
    writer.flush()?;
}

§Design

Conversion between different character encodings essentially requires byte buffers before and after the converter to implement Rust’s Read and Write traits because, whereas read and write must support byte-by-byte operations, character encoders and decoders consume and produce multiple bytes at a time to handle multi-byte characters. The types in this crate employ small buffers to operate byte-by-byte, but it bypasses the internal buffers and utilizes the supplied buffers as much as possible to minimize double-buffering and memory consumption.

Modules§

misc
Miscellaneous types not intended for direct access by name.

Structs§

DecodingReader
A reader wrapper that decodes an input byte stream into UTF-8.
EncodingWriter
A writer wrapper that encodes an input byte stream into the specified encoding.
MalformedError
The error type reported by DecodingReader and EncodingWriter when they encounter a malformed byte sequence.
UnmappableError
The error type reported by EncodingWriter when it encounters an unmappable character.