utf8-bufread 1.0.0

# UTF-8 Buffered Reader

This crate provides functions to read utf-8 text from any
type implementing [`io::BufRead`](io::BufRead) through a 
trait, [`BufRead`](BufRead), without waiting for newline
delimiters. These functions take advantage of buffering and 
either return `&`[`str`](str) or [`char`](char)s. Each has 
an associated iterator, some have an equivalent to a 
[`Map`](Map) iterator that avoids allocation and cloning as 
well.

[![crates.io](http://img.shields.io/crates/v/utf8-bufread.svg)](https://crates.io/crates/utf8_bufread)
[![docs.rs](https://docs.rs/utf8-bufread/badge.svg)](https://docs.rs/utf8-bufread/latest/utf8-bufread)
[![build status](https://gitlab.com/Austreelis/utf8-bufread/badges/main/pipeline.svg)](https://gitlab.com/Austreelis/utf8-bufread/-/commits/main)

# Usage

Add this crate as a dependency in your `Cargo.toml`:
```toml
[dependencies]
utf8-bufread = "1.0.0"
```

The simplest way to read a file using this crate may be 
something along the following:

```rust
// Reader may be any type implementing io::BufRead
// We'll just use a cursor wrapping a slice for this example
let mut reader = Cursor::new("Löwe 老虎 Léopard");
loop { // Loop until EOF
    match reader.read_str() {
        Ok(s) => {
            if s.is_empty() {
                break; // EOF
            }
            // Do something with `s` ...
            print!("{}", s);
        }
        Err(e) => {
            // We should try again if we get interrupted
            if e.kind() != ErrorKind::Interrupted {
                break;
            }
        }
    }
}
```

## Reading arbitrary-length string slices

The [`read_str`](read_str) function returns a 
`&`[`str`](str) of arbitrary length (up to the reader's 
buffer capacity) read from the inner reader, without cloning 
data, unless a valid codepoint ends up cut at the end of the 
reader's buffer. Its associated iterator can be obtained by 
calling [`str_iter`](str_iter), and since it involves 
cloning the data at each iteration, [`str_map`](str_map) is 
also provided.

## Reading codepoints

The [`read_char`](read_char) function returns a 
[`char`](char) read from the inner reader. Its associated 
iterator can be obtained by calling 
[`char_iter`](char_iter).

## Iterator types

This crate provides several structs for several ways of 
iterating over the inner reader's data:
- [`StrIter`](StrIter) and 
  [`CodepointIter`](CodepointIter) clone the data on each 
  iteration, but use an [`Rc`](Rc) to check if the returned 
  [`String`](String) buffer is still used. If not, it is 
  re-used to avoid re-allocating.
```rust
let mut reader = Cursor::new("Löwe 老虎 Léopard");
for s in reader.str_iter().filter_map(|r| r.ok()) {
    // Do something with s ...
    print!("{}", s);
}
```
- [`StrMap`](StrMap) and [`CodepointMap`](CodepointMap) 
  allow having access to read data without allocating nor 
  copying, but then it cannot be passed to further iterator 
  adapters.
```rust
let s = "Löwe 老虎 Léopard";
let mut reader = Cursor::new(s);
let count: usize = reader
    .str_map(|s| s.len())
    .filter_map(Result::ok)
    .sum();
println!("There is {} valid utf-8 bytes in {}", count, s);
```
- [`CharIter`](CharIter) is similar to [`StrIter`](StrIter)
  and others, except it relies on [`char`](char)s 
  implementing [`Copy`](Copy) and thus doesn't need a buffer 
  nor the "`Rc` trick".
```rust
let s = "Löwe 老虎 Léopard";
let mut reader = Cursor::new(s);
let count = reader
    .char_iter()
    .filter_map(Result::ok)
    .filter(|c| c.is_lowercase())
    .count();
assert_eq!(count, 9);
```

All these iterators may read data until EOF or an invalid 
codepoint is found. If valid codepoints are read from the 
inner reader, they *will* be returned before reporting an 
error. After encountering an error or EOF, they always 
return `None`(option). They always ignore any 
[`Interrupted`](Interrupted) error.


# Work in progress

This crate is still a work in progress. Part of its API can 
be considered stable:
- [`read_str`](read_str), [`read_codepoint`](read_codepoint) and [`read_char`](read_char)'s behavior and signature.
- [`str_iter`](str_iter), [`str_map`](str_map), [`codepoints_iter`](codepoints_iter), [`codepoints_map`](codepoints_map)
  and [`char_iter`](char_iter)'s behavior and signature.
- [`StrIter`](StrIter), [`StrMap`](StrMap), [`CodepointIter`](CodepointIter), [`CodepointMap`](CodepointMap) and
  [`CharIter`](CharIter)'s API.

However some features are still considered unstable:
- [`Error`](Error)'s behavior, particularly regarding its [`kind`](kind) and how it avoids
  data loss (see [`leftovers`](leftovers)).

And some features still have to be added:
- A lossy and unchecked version of `read_*` (see 
  [`from_utf8_lossy`](from_ut8_lossy) & 
  [`from_utf8_unchecked`](from_utf8_unchecked)).
- (Optional) Support for grapheme clusters using the [`unicode-segmentation`](unicode-segmentation) 
  crate, in the same fashion as [`read_codepoint`](read_codepoint).
- I'm open to suggestion, if you have ideas 😉

Given I'm not the most experience developer at all, you are 
very welcome to submit issues and push requests
[here](https://gitlab.com/Austreelis/utf8-bufread)


# License

Utf8-BufRead is distributed under the terms of the Apache 
License 2.0, see the 
[LICENSE](https://gitlab.com/Austreelis/utf8-bufread/-/blob/main/LICENSE)
file in the root directory of this repository.

[io::BufRead]: https://doc.rust-lang.org/std/io/trait.BufRead.html
[str]: https://doc.rust-lang.org/std/primitive.str.html
[char]: https://doc.rust-lang.org/std/primitive.char.html
[Map]: https://doc.rust-lang.org/std/iter/struct.Map.html
[Rc]: https://doc.rust-lang.org/std/rc/struct.Rc.html
[String]: https://doc.rust-lang.org/std/string/struct.String.html
[Copy]: https://doc.rust-lang.org/std/marker/trait.Copy.html
[option]: https://doc.rust-lang.org/std/option/index.html
[Interrupted]: https://doc.rust-lang.org/std/io/enum.ErrorKind.html#variant.Interrupted
[from_utf8_lossy]: https://doc.rust-lang.org/nightly/alloc/string/struct.String.html#method.from_utf8_lossy
[from_utf8_unchecked]: https://doc.rust-lang.org/nightly/alloc/string/struct.String.html#method.from_utf8_unchecked
[unicode-segmentation]: https://docs.rs/unicode-segmentation/latest/unicode_segmentation/index.html

[BufRead]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html
[read_str]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.read_str
[str_iter]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.str_iter
[str_map]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.str_map
[read_codepoint]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.read_codepoint
[codepoints_iter]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.codepoints_iter
[codepoints_map]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.codepoints_map
[read_char]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.read_char
[char_iter]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/trait.BufRead.html#method.char_iter
[StrIter]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/struct.StrIter.html
[StrMap]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/struct.StrMap.html
[CodepointIter]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/struct.CodepointIter.html
[CodepointMap]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/struct.CodepointMap.html
[CharIter]: https://docs.rs/utf8-bufread/1.0.0/utf8_bufread/struct.CharIter.html