Expand description
§bytes2chars
lazily decodes utf-8 chars from bytes
provides lazy, fallible analogs to str::Chars (Utf8Chars) and str::CharIndices (Utf8CharIndices), as well as a lower-level push-based Utf8Decoder
§installation
cargo add bytes2chars§design goals
- rich errors—what went wrong and where
- lazy
no-std- performance
§quick start
prefer iterators like Utf8CharIndices or Utf8Chars if you have access to a byte iterator. Utf8Chars still tracks bytes for error context, so it’s purely a convenience wrapper
if you receive bytes in chunks, use the push-based Utf8Decoder
§examples
§iterator api
let input = b"\xF0\x9F\xA6\x80 rust".iter().copied();
// decode into an iterator of chars and their positions
let indexed = Utf8CharIndices::from(input.clone()).collect::<Result<Vec<_>>>()?;
let expected = vec![(0, '🦀'), (4, ' '), (5, 'r'), (6, 'u'), (7, 's'), (8, 't')];
assert_eq!(indexed, expected);
// convenience wrapper to decode into an iterator of chars
let chars = Utf8Chars::from(input).collect::<Result<String>>()?;
assert_eq!(chars, "🦀 rust");§push based decoder
let mut decoder = Utf8Decoder::new(0);
assert_eq!(decoder.push(0xF0), None); // accumulating
assert_eq!(decoder.push(0x9F), None);
assert_eq!(decoder.push(0xA6), None);
assert_eq!(decoder.push(0x80), Some(Ok((0, '🦀')))); // complete
assert_eq!(decoder.push(0xF0), None); // start new sequence
let err = Error {
range: 4..5,
kind: ErrorKind::UnfinishedSequence,
};
assert_eq!(decoder.finish(), Err(err)); // check for truncated sequence
§rfc 3629 conformance
decoding requirements are formally specified in spec/utf8.md,
derived from RFC 3629. requirements are linked to implementation and tests using Tracey
conformance is validated against the flenniken utf-8 test suite
§comparison with alternatives
the unique benefit bytes2chars provides is rich error context
see BENCHMARKS.md for throughput comparisons. bytes2chars still has a ways to go with perf!
§std::str::from_utf8
eager and error context provides a range but not a particular cause
§utf8-decode
also lazy. error provides a range but not a particular cause. does not provide a push based decoder
§bstr::ByteSlice::chars
also lazy. swallows errors. does not provide a push based decoder. really fast
Structs§
- Error
- invalid utf-8 at bytes {range:?}: {kind}
- Utf8
Char Indices - fallible analog to
CharIndices, backed by a byte iterator - Utf8
Chars - fallible analog to
Chars, backed by a byte iterator - Utf8
Decoder - push based UTF-8 decoder that tracks byte positions
Enums§
Type Aliases§
- Result
- a specialized
core::result::Resultfor utf8 decoding