Expand description
This crate provides functions to read utf-8 text from any type implementing io::BufRead
through a trait, BufRead, without waiting for newline delimiters. These functions take
advantage of buffering and either return &str or chars. Each has an associated
iterator, some have an equivalent to a Map iterator that avoids allocation and cloning as
well.
§Quick Start
The simplest way to read a file using this crate may be something along the following:
use utf8_bufread::BufRead;
use std::io::{Cursor, ErrorKind};
use std::borrow::Cow;
// Reader may be any type implementing io::BufRead
// We'll just use a cursor wrapping a slice for this example
let mut reader = Cursor::new("Löwe 老虎 Léopard");
loop { // Loop until EOF
match reader.read_str() {
Ok(s) => {
if s.is_empty() {
break; // EOF
}
// Do something with `s` ...
print!("{}", s);
}
Err(e) => {
// We should try again if we get interrupted
if e.kind() != ErrorKind::Interrupted {
break;
}
}
}
}§Reading arbitrary-length string slices
The read_str function returns a &str of arbitrary length (up to the reader’s buffer
capacity) read from the inner reader, without cloning data, unless a valid codepoint ends up
cut at the end of the reader’s buffer. Its associated iterator can be obtained by calling
str_iter, and since it involves cloning the data at each iteration, str_map is also
provided.
§Reading codepoints
The read_char function returns a char read from the inner reader. Its associated
iterator can be obtained by calling char_iter.
§Iterator types
This crate provides several structs for several ways of iterating over the inner reader’s data:
StrIterand [CodepointIter] clone the data on each iteration, but use anRcto check if the returnedStringbuffer is still used. If not, it is re-used to avoid re-allocating.use utf8_bufread::BufRead; use std::io::Cursor; let mut reader = Cursor::new("Löwe 老虎 Léopard"); for s in reader.str_iter().filter_map(|r| r.ok()) { // Do something with s ... print!("{}", s); }StrMapand [CodepointMap] allow having access to read data without cloning, but then it cannot be passed to further iterator adapters.use utf8_bufread::BufRead; use std::io::Cursor; let s = "Löwe 老虎 Léopard"; let mut reader = Cursor::new(s); let count: usize = reader.str_map(|s| s.len()).filter_map(Result::ok).sum(); println!("There is {} valid utf-8 bytes in {}", count, s);CharIteris similar toStrIterand others, except it relies onchars implementingCopyand thus doesn’t need a buffer nor the “Rctrick”.use utf8_bufread::BufRead; use std::io::Cursor; let s = "Löwe 老虎 Léopard"; let mut reader = Cursor::new(s); let count = reader.char_iter().filter_map(Result::ok).filter(|c| c.is_lowercase()).count(); assert_eq!(count, 9);
All these iterators may read data until EOF or an invalid codepoint is found. If valid
codepoints are read from the inner reader, they will be returned before reporting an error.
After encountering an error or EOF, they always return None. They always ignore any
Interrupted error.
Structs§
- Char
Iter - An iterator over chars of an instance of
io::BufRead, created bychar_iter, see its documentation for more details. - Error
- The error type for operations of the
BufReadtrait and associated iterators. - StrIter
- An iterator over string slices of an instance of
io::BufRead, created bystr_iter, see its documentation for more details. - StrMap
- A mapping iterator over string slices of an instance of
io::BufRead, created bystr_map, see its documentation for more details.
Traits§
- BufRead
- A trait implemented for all types implementing
io::BufRead, providing functions to read utf-8 text streams without waiting for newline delimiters.