1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
//! # bytes2chars
//!
//! lazily decodes utf-8 [`char`][char]s from bytes
//!
//! provides lazy, fallible analogs to [`str::Chars`][str-chars] ([`Utf8Chars`][utf8-chars]) and [`str::CharIndices`][str-char-indices] ([`Utf8CharIndices`][utf8-char-indices]), as well as a lower-level push-based [`Utf8Decoder`][utf8-decoder]
//!
//! [char]: https://doc.rust-lang.org/stable/std/primitive.char.html
//! [str-chars]: https://doc.rust-lang.org/stable/std/str/struct.Chars.html
//! [str-char-indices]: https://doc.rust-lang.org/stable/std/str/struct.CharIndices.html
//! [utf8-chars]: https://docs.rs/bytes2chars/latest/bytes2chars/struct.Utf8Chars.html
//! [utf8-char-indices]: https://docs.rs/bytes2chars/latest/bytes2chars/struct.Utf8CharIndices.html
//! [utf8-decoder]: https://docs.rs/bytes2chars/latest/bytes2chars/struct.Utf8Decoder.html
//!
//! ## design goals
//!
//! - rich errors—what went wrong and where
//! - lazy
//! - `no-std`
//! - performance
//!
//! ## quick start
//!
//! prefer iterators like [`Utf8CharIndices`][utf8-char-indices] or [`Utf8Chars`][utf8-chars] if you have access to a byte iterator. [`Utf8Chars`][utf8-chars] still tracks bytes for error context, so it's purely a convenience wrapper
//!
//! if you receive bytes in chunks, use the push-based [`Utf8Decoder`][utf8-decoder]
//!
//! ## examples
//!
//! ### iterator api
//!
//! ```rust
//! # use bytes2chars::{Result, Utf8CharIndices, Utf8Chars};
//! # fn main() -> Result<()> {
//! let input = b"\xF0\x9F\xA6\x80 rust".iter().copied();
//!
//! // decode into an iterator of chars and their positions
//! let indexed = Utf8CharIndices::from(input.clone()).collect::<Result<Vec<_>>>()?;
//! let expected = vec![(0, '🦀'), (4, ' '), (5, 'r'), (6, 'u'), (7, 's'), (8, 't')];
//! assert_eq!(indexed, expected);
//!
//! // convenience wrapper to decode into an iterator of chars
//! let chars = Utf8Chars::from(input).collect::<Result<String>>()?;
//! assert_eq!(chars, "🦀 rust");
//! # Ok(())
//! # }
//! ```
//!
//! ### error handling
//!
//! ```rust
//! # use bytes2chars::{Error, ErrorKind, Result, Utf8Chars};
//! # fn main() -> Result<()> {
//! let err = Utf8Chars::from(b"hello \x80 world".iter().copied())
//! .collect::<Result<String>>()
//! .unwrap_err();
//!
//! assert_eq!(
//! err,
//! Error {
//! range: 6..7,
//! kind: ErrorKind::InvalidLead(0x80)
//! }
//! );
//! assert_eq!(
//! err.to_string(),
//! "invalid utf-8 at bytes 6..7: byte 0x80 cannot start a UTF-8 sequence"
//! );
//! # Ok(())
//! # }
//! ```
//!
//! ### push based decoder
//!
//! ```rust
//! # use bytes2chars::Utf8Decoder;
//! # fn main() -> bytes2chars::Result<()> {
//! let mut decoder = Utf8Decoder::new(0);
//! assert_eq!(decoder.push(0xF0), None); // accumulating
//! assert_eq!(decoder.push(0x9F), None);
//! assert_eq!(decoder.push(0xA6), None);
//! assert_eq!(decoder.push(0x80), Some(Ok((0, '🦀')))); // complete
//! decoder.finish()?; // check for truncated sequence
//!
//! # Ok(())
//! # }
//! ```
//!
//! ## rfc 3629 conformance
//!
//! decoding requirements are formally specified in [`spec/utf8.md`][spec],
//! derived from [RFC 3629](https://datatracker.ietf.org/doc/html/rfc3629). requirements are linked to implementation and tests using [Tracey][tracey]
//!
//! conformance is validated against the [flenniken utf-8 test suite][utf8tests]
//!
//! [spec]: ../../spec/utf8.md
//! [tracey]: https://tracey.bearcove.eu/
//! [utf8tests]: https://github.com/flenniken/utf8tests
//!
//! ## alternatives
//!
//! ### [`std::str::from_utf8`](https://doc.rust-lang.org/std/str/fn.from_utf8.html)
//!
//! eager and error context provides a range but not a particular cause
//!
//! ### [`utf8-decode`](https://docs.rs/utf8-decode/latest/utf8_decode/index.html)
//!
//! also lazy. error provides a range but not a particular cause. does not provide a push based decoder
pub use Utf8Decoder;
pub use ;
pub use ;