bytes2chars
lazily decodes utf-8 chars from bytes
provides lazy, fallible analogs to str::Chars (Utf8Chars) and str::CharIndices (Utf8CharIndices), as well as a lower-level push-based Utf8Decoder
installation
cargo add bytes2chars
design goals
- rich errors—what went wrong and where
- lazy
no-std- performance
quick start
prefer iterators like Utf8CharIndices or Utf8Chars if you have access to a byte iterator. Utf8Chars still tracks bytes for error context, so it's purely a convenience wrapper
if you receive bytes in chunks, use the push-based Utf8Decoder
examples
iterator api
let input = b"\xF0\x9F\xA6\x80 rust".iter.copied;
// decode into an iterator of chars and their positions
let indexed = from.?;
let expected = vec!;
assert_eq!;
// convenience wrapper to decode into an iterator of chars
let chars = from.?;
assert_eq!;
push based decoder
let mut decoder = new;
assert_eq!; // accumulating
assert_eq!;
assert_eq!;
assert_eq!; // complete
assert_eq!; // start new sequence
let err = Error ;
assert_eq!; // check for truncated sequence
rfc 3629 conformance
decoding requirements are formally specified in spec/utf8.md,
derived from RFC 3629. requirements are linked to implementation and tests using Tracey
conformance is validated against the flenniken utf-8 test suite
comparison with alternatives
the unique benefit bytes2chars provides is rich error context
see BENCHMARKS.md for throughput comparisons. bytes2chars still has a ways to go with perf!
std::str::from_utf8
eager and error context provides a range but not a particular cause
utf8-decode
also lazy. error provides a range but not a particular cause. does not provide a push based decoder
bstr::ByteSlice::chars
also lazy. swallows errors. does not provide a push based decoder. really fast