Trait utf8_bufread::BufRead[][src]

pub trait BufRead: BufRead {
    fn read_utf8(&mut self, buf: &mut String) -> Result<usize> { ... }
}

A trait implemented for all types implementing io::BufRead, providing functions to read utf-8 text streams without waiting for newline delimiters.

Provided methods

fn read_utf8(&mut self, buf: &mut String) -> Result<usize>[src]

Read a number of bytes less than or equal to the capacity of the its buffer, and push their utf-8 representation in the provided buf. It returns the number of bytes read as a io::Result<usize>.

This function will read bytes from the underlying stream until its buffer is full, an invalid or incomplete codepoint is found, or EOF is found. Once found, all codepoints up to, including the EOF (if found), but not including the invalid or incomplete codepoint (if found), will be appended to the provided buf.

If the operation is successful, this function resturns the number of bytes read. Note this may not be the number of chars read, as UTF-8 is a variable-length encoding.

If this function returns [Ok(0)], the stream has reached EOF.

This function avoids the usual issues of using BufRead::read_line(&self, &mut String) or BufRead::lines(&self) on big text file without newline delimiters: It will not load the whole file in memory.

Errors

This function will immediately return any errors returned by fill_buf.

If an Utf8Error is returned by the internal call to from_utf8, all valid codepoints are returned, and no error is returned, unless no valid codepoints were read. This allows not to lose any valid data, and the error will be returned on the next call.

If the first codepoint encountered by from_utf8 is invalid or incomplete, an ErrorKind::InvalidData caused by an Utf8Error is returned. This error cannot be recovered from, and you will have to read bytes manually to determine if the error was caused by an invalid codepoint in middle of the file or by an incomplete codepoint because of an early EOF.

Examples

use utf8_bufread::BufRead;
use std::io::{BufReader, ErrorKind};

// "foo\nbar" + some invalid bytes
// We give the buffer more than enough capacity to be able to read all the bytes in one
// call
let mut reader = BufReader::with_capacity(
    16,
    [0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96].as_ref(),
);
let mut buf = String::new();

// On the first read_utf8() call, we will read up to the first byte of the invalid
// codepoint (ie "foo\nbar")
let n_read = reader
    .read_utf8(&mut buf)
    .expect("We will get all the valid bytes without error");
assert_eq!("foo\nbar", buf.as_str());
assert_eq!(7, n_read);

// Then on the second call we will get the InvalidData error caused by the Utf8Error error,
// as there is no bytes forming valid codepoints left
let read_err = reader.read_utf8(&mut buf).expect_err("We will get an error");
assert_eq!(ErrorKind::InvalidData, read_err.kind());
assert_eq!(7, buf.len());  // no byte appended to buf
Loading content...

Implementors

impl<R: BufRead> BufRead for R[src]

Loading content...