Trait utf8_bufread::BufRead[][src]

pub trait BufRead: BufRead {
    fn read_utf8(&mut self, buf: &mut String) -> Result<usize> { ... }
fn with_utf8_chunk<F>(&mut self, f: F) -> Result<usize>
    where
        F: FnOnce(&str)
, { ... }
fn map_utf8<F, T>(&mut self, map: F) -> ChunkIter<'_, Self, F, &'_ str, T>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;

    where
        F: FnMut(&str) -> T
, { ... }
fn map_utf8_results<F, T>(
        &mut self,
        map: F
    ) -> ChunkIter<'_, Self, F, Result<&'_ str>, T>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;

    where
        F: FnMut(Result<&str>) -> T
, { ... }
fn iter_utf8(
        &mut self
    ) -> ChunkIter<'_, Self, fn(_: &str) -> String, &'_ str, String>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;
{ ... }
fn iter_utf8_results(
        &mut self
    ) -> ChunkIter<'_, Self, fn(_: Result<&str>) -> Result<String>, Result<&'_ str>, Result<String>>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;
{ ... } }

A trait implemented for all types implementing io::BufRead, providing functions to read utf-8 text streams without waiting for newline delimiters.

Provided methods

fn read_utf8(&mut self, buf: &mut String) -> Result<usize>[src]

Read some bytes from the inner reader, and push their utf-8 representation in the provided buf. Return the number of bytes read as a io::Result<usize>.

This functions calls with_utf8_chunk and push passed &str to buf (which means it clones the bytes), see its documentation for more info.

Errors

This function follows the same error policy as with_utf8_chunk.

Examples

use utf8_bufread::BufRead;
use std::io::{BufReader, ErrorKind};

// "foo\nbar" + some invalid bytes
// We give the buffer more than enough capacity to be able to read all the bytes in one
// call
let mut reader = BufReader::with_capacity(
    16,
    [0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96, 0x0].as_ref(),
);
let mut buf = String::new();

// On the first read_utf8() call, we will read up to the first byte of the invalid
// codepoint (ie "foo\nbar")
let n_read = reader
    .read_utf8(&mut buf)
    .expect("We will get all the valid bytes without error");
assert_eq!("foo\nbar", buf.as_str());
assert_eq!(7, n_read);

// Then on the second call we will get the InvalidData error caused by the Utf8Error error,
// as there is no bytes forming valid codepoints left
let read_err = reader.read_utf8(&mut buf).expect_err("We will get an error");
assert_eq!(ErrorKind::InvalidData, read_err.kind());
assert_eq!(7, buf.len());  // no byte appended to buf

fn with_utf8_chunk<F>(&mut self, f: F) -> Result<usize> where
    F: FnOnce(&str), 
[src]

Read some bytes from the inner reader, and call provided function with a reference to read data as an UTF-8 str. Returns the number of bytes read as a io::Result<usize>.

f is called if and only if we read a non-zero amount of valid UTF-8 bytes.

If the operation is successful, this function returns the number of bytes read. Note this may not be the number of chars read, as UTF-8 is a variable-length encoding.

If this function returns Ok(0), the stream has reached EOF.

This function will read bytes from the underlying stream until its buffer is full, an invalid or incomplete codepoint is found, or EOF is found. Once found, all codepoints up to, including the EOF (if found), but not including the invalid or incomplete codepoint (if found), will be passed as f’s argument. Note this may allow you to manipulate the str without cloning data.

This function avoids the usual issues of using BufRead::read_line(&self, &mut String) or BufRead::lines(&self) on big text file without newline delimiters: It will not load the whole file in memory.

The amount of byte read depends on the size of the underlying buffer as well as previous calls. It cannot exceed the size of the buffer, unless it is not big enough to fit a unicode codepoint.

Errors

This function will immediately return any errors returned by fill_buf.

If an Utf8Error is returned by the internal call to from_utf8, all valid codepoints are returned, and no error is returned, unless no valid codepoints were read. This allows not to lose any valid data, and the error will be returned on the next call.

If the first codepoint encountered by from_utf8 is invalid, an ErrorKind::InvalidData caused by an Utf8Error is returned. You can still read bytes from this reader but any convertion to UTF-8 will fail.

If EOF is encountered on an incomplete codepoint, an ErrorKind::UnexpectedEof is returned.

Note this function will return an ErrorKind::InvalidInput if the buffer of this reader is too small to read a unicode codepoint. Currently, a buffer of size 1 will always reading any non-ascii codepoint, and a buffer of size 2 may or may not cause this function to fail. A buffer of size 3 will allow this function to read any codepoint correctly.

Examples

use utf8_bufread::BufRead;
use std::io::{BufReader, ErrorKind};

// "foo\nbar" + some invalid bytes
// We give the buffer more than enough capacity to be able to read all the bytes in one
// call
let mut reader = BufReader::with_capacity(
    16,
    [0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96, 0x0].as_ref(),
);
// We will store data in this buffer while inside passed closure
let mut buf = String::new();

// On the first read_utf8() call, we will read up to the first byte of the invalid
// codepoint (ie "foo\nbar")
let n_read = reader
    .with_utf8_chunk(|s| buf.push_str(s))
    .expect("We will get all the valid bytes without error");
assert_eq!("foo\nbar", buf.as_str());
assert_eq!(7, n_read);

// Then on the second call we will get the InvalidData error caused by the Utf8Error error,
// as there is no bytes forming valid codepoints left
// Passed closure will not be called
let mut is_called = false;
let read_err = reader.with_utf8_chunk(|_| {is_called = true;})
    .expect_err("We will get an error");
assert_eq!(ErrorKind::InvalidData, read_err.kind());
assert!(!is_called);

fn map_utf8<F, T>(&mut self, map: F) -> ChunkIter<'_, Self, F, &'_ str, T>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;
where
    F: FnMut(&str) -> T, 
[src]

Takes a closure and creates an Iterator which calls that closure on each read chunk of data.

This is equivalent to calling with_utf8_chunk in a loop.

The created iterator will stop when reaching EOF or an invalid UTF-8 byte. If you wish to know the cause, see map_utf8_results

Examples

use utf8_bufread::BufRead;
use std::io::{BufReader, ErrorKind};

// "foo\nbar" + some invalid bytes
// We do not give the buffer enough capacity to read the whole slice in one call, just to
// make it iterate more than once for this example
let mut reader = BufReader::with_capacity(
    4,
    [0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96, 0x0].as_ref(),
);

// We read all the data we can, and sum the substrings length
assert_eq!(7usize, reader.map_utf8(|s| s.len()).sum());

fn map_utf8_results<F, T>(
    &mut self,
    map: F
) -> ChunkIter<'_, Self, F, Result<&'_ str>, T>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;
where
    F: FnMut(Result<&str>) -> T, 
[src]

Takes a closure and creates an Iterator which calls that closure on each read chunk of data with either an Ok containing the read &str, or the error returned by with_utf8_chunk.

The created iterator will stop when reaching EOF or an invalid UTF-8 byte.

Examples

use utf8_bufread::BufRead;
use std::io::{BufReader, ErrorKind};

// "foo\nbar" + some invalid bytes
// We do not give the buffer enough capacity to read the whole slice in one call, just to
// make it iterate more than once for this example
let mut reader = BufReader::with_capacity(
    4,
    [0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96, 0x0].as_ref(),
);

let err = reader
    // Take the length of the string or the returned error
    .map_utf8_results(|r| match r { Ok(s) => Ok(s.len()), Err(e) => Err(e)})
    // Sum strings length, but returns the error if encountered
    // Iterator stops after returning an error, so no need to short-circuit
    .fold(Ok(0), |acc, r| if let Ok(n) = r { Ok(n + acc.unwrap()) } else { r } )
    // We are getting an error since we have invalid bytes
    .unwrap_err();
assert_eq!(ErrorKind::InvalidData, err.kind());

fn iter_utf8(
    &mut self
) -> ChunkIter<'_, Self, fn(_: &str) -> String, &'_ str, String>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;
[src]

Creates an Iterator over the chunks of utf8 data read by this reader.

This is equivalent to creating a new String and calling read_utf8 in a loop.

The created iterator will stop when reaching EOF or an invalid UTF-8 byte. If you wish to know the cause, see iter_utf8_results.

Note returned iterator always clones the data read from the reader, regardless if it is later thrown away.

Examples

Note the following example involves cloning each read chunk two times.

use utf8_bufread::BufRead;
use std::io::BufReader;

// "foo\nbar" + some invalid bytes
// We do not give the buffer enough capacity to read the whole slice in one call, just to
// make it iterate more than once for this example
let mut reader = BufReader::with_capacity(
    4,
    [0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96, 0x0].as_ref(),
);

// Getting all valid data until EOF or invalid codepoint
let text: String = reader.iter_utf8().collect();
assert_eq!("foo\nbar", text.as_str());

fn iter_utf8_results(
    &mut self
) -> ChunkIter<'_, Self, fn(_: Result<&str>) -> Result<String>, Result<&'_ str>, Result<String>>

Notable traits for ChunkIter<'_, R, F, &str, T>

impl<R, F, T> Iterator for ChunkIter<'_, R, F, &str, T> where
    R: BufRead,
    F: FnMut(&str) -> T, 
type Item = T;impl<R, F, T> Iterator for ChunkIter<'_, R, F, Result<&str>, T> where
    R: BufRead,
    F: FnMut(Result<&str>) -> T, 
type Item = T;
[src]

Creates an Iterator over the chunks of utf8 data read by this reader.

This is equivalent to creating a new String and calling read_utf8 in a loop.

Note returned iterator always clones the data read from the reader, regardless if it is later thrown away.

Examples

Note the following example still involves cloning each read chunk one time.

use utf8_bufread::BufRead;
use std::io::{BufReader, ErrorKind};

// "foo\nbar" + some invalid bytes
// We do not give the buffer enough capacity to read the whole slice in one call, just to
// make it iterate more than once for this example
let mut reader = BufReader::with_capacity(
    4,
    [0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96, 0x0].as_ref(),
);

// We just take the last element which should be the error cause by the invalid bytes
let err = reader.iter_utf8_results().last().unwrap();
assert!(err.is_err());
assert_eq!(ErrorKind::InvalidData, err.unwrap_err().kind());
Loading content...

Implementors

impl<R: BufRead> BufRead for R[src]

Loading content...