Crate utf8_bufread[−][src]
This crate provides functions to read utf-8 text from any type implementing io::BufRead
through a trait, BufRead
, without waiting for newline delimiters. These functions take
advantage of buffering and either return &
str
or char
s. Each has an associated
iterator, some have an equivalent to a Map
iterator that avoids allocation and cloning as
well.
Quick Start
The simplest way to read a file using this crate may be something along the following:
use utf8_bufread::BufRead; use std::io::{Cursor, ErrorKind}; use std::borrow::Cow; // Reader may be any type implementing io::BufRead // We'll just use a cursor wrapping a slice for this example let mut reader = Cursor::new("Löwe 老虎 Léopard"); loop { // Loop until EOF match reader.read_str() { Ok(s) => { if s.is_empty() { break; // EOF } // Do something with `s` ... print!("{}", s); } Err(e) => { // We should try again if we get interrupted if e.kind() != ErrorKind::Interrupted { break; } } } }
Reading arbitrary-length string slices
The read_str
function returns a &
str
of arbitrary length (up to the reader’s buffer
capacity) read from the inner reader, without cloning data, unless a valid codepoint ends up
cut at the end of the reader’s buffer. Its associated iterator can be obtained by calling
str_iter
, and since it involves cloning the data at each iteration, str_map
is also
provided.
Reading codepoints
The read_char
function returns a char
read from the inner reader. Its associated
iterator can be obtained by calling char_iter
.
Iterator types
This crate provides several structs for several ways of iterating over the inner reader’s data:
StrIter
and [CodepointIter
] clone the data on each iteration, but use anRc
to check if the returnedString
buffer is still used. If not, it is re-used to avoid re-allocating.use utf8_bufread::BufRead; use std::io::Cursor; let mut reader = Cursor::new("Löwe 老虎 Léopard"); for s in reader.str_iter().filter_map(|r| r.ok()) { // Do something with s ... print!("{}", s); }
StrMap
and [CodepointMap
] allow having access to read data without cloning, but then it cannot be passed to further iterator adapters.use utf8_bufread::BufRead; use std::io::Cursor; let s = "Löwe 老虎 Léopard"; let mut reader = Cursor::new(s); let count: usize = reader.str_map(|s| s.len()).filter_map(Result::ok).sum(); println!("There is {} valid utf-8 bytes in {}", count, s);
CharIter
is similar toStrIter
and others, except it relies onchar
s implementingCopy
and thus doesn’t need a buffer nor the “Rc
trick”.use utf8_bufread::BufRead; use std::io::Cursor; let s = "Löwe 老虎 Léopard"; let mut reader = Cursor::new(s); let count = reader.char_iter().filter_map(Result::ok).filter(|c| c.is_lowercase()).count(); assert_eq!(count, 9);
All these iterators may read data until EOF or an invalid codepoint is found. If valid
codepoints are read from the inner reader, they will be returned before reporting an error.
After encountering an error or EOF, they always return None
. They always ignore any
Interrupted
error.
Structs
CharIter | An iterator over chars of an instance of |
Error | The error type for operations of the |
StrIter | An iterator over string slices of an instance of |
StrMap | A mapping iterator over string slices of an instance of |
Traits
BufRead | A trait implemented for all types implementing |