utf8-bufread 0.1.4

Functions for BufRead to read large text file without worrying about newlines
Documentation
# UTF-8 Buffered Reader

Provides a `read_utf8` function for all types implementing
[`BufRead`](BufRead), allowing to read text file without worrying about 
loading huge files without newline delimiters.


# Usage

Add this crate as a dependency in your `Cargo.toml`:
```toml
[dependencies]
utf8-bufread = "0.1.4"
```

This will allow you to use the function `read_utf8` on any object 
implementing [`std::io::BufRead`](BufRead). This function essentially reads a
stream and push the data to a [`String`](String):

```rust
use utf8_bufread::BufRead;
use std::io::BufReader;

fn main() {
  // Reader may be any type implementing io::BufRead
  // We'll just use a BufReader wrapping a slice for this 
  // example
  let mut reader = BufReader::<&[u8]>::new("💖".as_ref());
  // The string we'll use to store the text of the read file
  let mut text = String::new();
  loop { // Loop until EOF
    match reader.read_utf8(&mut text) {
        Ok(0) => break, // EOF
        Ok(_) => continue,
        Err(e) => panic!(e), // io::Error or Utf8Error
    }
  }
  assert_eq!("💖", text.as_str());
}
```

A common issue encountered when using the standard Rust library to read large 
files of text is that these may have extremely long lines or no newline 
delimiters at all. This makes [`BufReader::read_line`](read_line) or
[`BufReader::lines`](lines) load a large amount of data into memory, which may
not be desirable.

The function `read_utf8`, on the other hand, will only read up until the
reader's buffer is full.

If valid utf-8 codepoint is read it will **always** be pushed. If an invalid
or incomplete codepoint is read, the function will first push all the valid
bytes read and an [`InvalidData`](InvalidData) error will be returned on the 
next call:

```rust
 use utf8_bufreader::BufRead;
 use std::io::{BufReader, ErrorKind};

fn main() {
  use utf8_bufread::BufRead;
  use std::io::{BufReader, ErrorKind};
 
  // We give the buffer more than enough capacity to be
  // able to read all the bytes in one call
  let mut reader = BufReader::with_capacity(
    16,
    [ // "foo\nbar" + some invalid bytes
      0x66u8, 0x6f, 0x6f, 0xa, 0x62, 0x61, 0x72, 0x9f, 0x92, 0x96
    ].as_ref(),
  );
  let mut buf = String::new();
 
  // On the first read_utf8() call, we will read up to the
  // first byte of the invalid codepoint (ie "foo\nbar")
  let n_read = reader
          .read_utf8(&mut buf)
          .expect("We will get all the valid bytes");
  assert_eq!("foo\nbar", buf.as_str());
  assert_eq!(7, n_read);
 
  // Then on the second call we will get the InvalidData
  // error caused by the Utf8Error error, as there is no
  // bytes forming valid codepoints left
  let read_err = reader.read_utf8(&mut buf)
          .expect_err("We will get an error");
  assert_eq!(ErrorKind::InvalidData, read_err.kind());
  assert_eq!(7, buf.len());  // no byte appended to buf
}
```


# Work in progress

This crate is fairly new, and for now only provides the `read_utf8` function,
with a rather simple implementation. In the near future these features should
be added:

- A lossy and unchecked version of `read_utf8` (see 
  [`from_utf8_lossy`]from_ut8_lossy & 
  [`from_utf8_unchecked`]from_utf8_unchecked).
- A `char`s iterator from the buffer, and its lossy version.
- I'm open to suggestion, if you have ideas 😉

**This also means it may have a pretty unstable API**

I am also looking for a way for `read_utf8` to return a `&str` instead of a 
`String`, meaning the reader is borrowed until the returned reference goes out
of scope, so that I let the user choose if they want to clone the data or not.
For the moment, the read codepoints are always cloned into a new `String`.

Finally, I want to test and benchmark this crate.

Given I'm not the most experience developer at all, you are very welcome to
submit push requests [here](https://gitlab.com/Austreelis/utf8-bufread)


# License

Utf8-BufRead is distributed under the terms of the Apache License 2.0, see the
[LICENSE](https://gitlab.com/Austreelis/utf8-bufread/-/blob/main/LICENSE)
file in the root directory of this repository.

[BufRead]: https://doc.rust-lang.org/std/io/trait.BufRead.html
[String]: https://doc.rust-lang.org/nightly/alloc/string/struct.String.html
[read_line]: https://doc.rust-lang.org/nightly/std/io/trait.BufRead.html#method.read_line
[lines]:https://doc.rust-lang.org/nightly/std/io/trait.BufRead.html#method.lines
[InvalidData]: https://doc.rust-lang.org/nightly/std/io/enum.ErrorKind.html#variant.InvalidData
[from_utf8_lossy]: https://doc.rust-lang.org/nightly/alloc/string/struct.String.html#method.from_utf8_lossy
[from_utf8_unchecked]: https://doc.rust-lang.org/nightly/alloc/string/struct.String.html#method.from_utf8_unchecked