Crate utf8_bufread[][src]

This crate provides functions to read utf-8 text from any type implementing io::BufRead through a trait, BufRead, without waiting for newline delimiters. These functions take advantage of buffering and either return &str or chars. Each has an associated iterator, some have an equivalent to a Map iterator that avoids allocation and cloning as well.

Quick Start

The simplest way to read a file using this crate may be something along the following:

use utf8_bufread::BufRead;
use std::io::{Cursor, ErrorKind};
use std::borrow::Cow;

// Reader may be any type implementing io::BufRead
// We'll just use a cursor wrapping a slice for this example
let mut reader = Cursor::new("Löwe 老虎 Léopard");
loop { // Loop until EOF
    match reader.read_str() {
        Ok(s) => {
            if s.is_empty() {
                break; // EOF
            }
            // Do something with `s` ...
            print!("{}", s);
        }
        Err(e) => {
            // We should try again if we get interrupted
            if e.kind() != ErrorKind::Interrupted {
                break;
            }
        }
    }
}

Reading arbitrary-length string slices

The read_str function returns a &str of arbitrary length (up to the reader’s buffer capacity) read from the inner reader, without cloning data, unless a valid codepoint ends up cut at the end of the reader’s buffer. Its associated iterator can be obtained by calling str_iter, and since it involves cloning the data at each iteration, str_map is also provided.

Reading codepoints

The read_char function returns a char read from the inner reader. Its associated iterator can be obtained by calling char_iter.

Iterator types

This crate provides several structs for several ways of iterating over the inner reader’s data:

  • StrIter and [CodepointIter] clone the data on each iteration, but use an Rc to check if the returned String buffer is still used. If not, it is re-used to avoid re-allocating.
    use utf8_bufread::BufRead;
    use std::io::Cursor;
    
    let mut reader = Cursor::new("Löwe 老虎 Léopard");
    for s in reader.str_iter().filter_map(|r| r.ok()) {
        // Do something with s ...
        print!("{}", s);
    }
  • StrMap and [CodepointMap] allow having access to read data without cloning, but then it cannot be passed to further iterator adapters.
    use utf8_bufread::BufRead;
    use std::io::Cursor;
    
    let s = "Löwe 老虎 Léopard";
    let mut reader = Cursor::new(s);
    let count: usize = reader.str_map(|s| s.len()).filter_map(Result::ok).sum();
    println!("There is {} valid utf-8 bytes in {}", count, s);
  • CharIter is similar to StrIter and others, except it relies on chars implementing Copy and thus doesn’t need a buffer nor the “Rc trick”.
    use utf8_bufread::BufRead;
    use std::io::Cursor;
    
    let s = "Löwe 老虎 Léopard";
    let mut reader = Cursor::new(s);
    let count = reader.char_iter().filter_map(Result::ok).filter(|c| c.is_lowercase()).count();
    assert_eq!(count, 9);

All these iterators may read data until EOF or an invalid codepoint is found. If valid codepoints are read from the inner reader, they will be returned before reporting an error. After encountering an error or EOF, they always return None. They always ignore any Interrupted error.

Structs

CharIter

An iterator over chars of an instance of io::BufRead, created by char_iter, see its documentation for more details.

Error

The error type for operations of the BufRead trait and associated iterators.

StrIter

An iterator over string slices of an instance of io::BufRead, created by str_iter, see its documentation for more details.

StrMap

A mapping iterator over string slices of an instance of io::BufRead, created by str_map, see its documentation for more details.

Traits

BufRead

A trait implemented for all types implementing io::BufRead, providing functions to read utf-8 text streams without waiting for newline delimiters.