UTF-8 Buffered Reader
Provides alternatives to BufRead::read_line
and
BufRead::lines
that allow getting UTF-8 strings but do not stop
on newline delimiters, to avoid loading large amount of data in memory when
reading files with few newlines.
Usage
Add this crate as a dependency in your Cargo.toml
:
[]
= "0.1.5"
This will allow you to use the BufRead
trait provided by this
crate and automatically implemented on any type implementing
std::io::BufRead
.
This trait provides functions to read utf8 strings from a stream, but none of
those functions guarantee the read chunk of data will end on a newline
delimiter (unlike BufRead::read_line
or
BufRead::lines
). This allows you to use buffered readers and
std::io::BufRead
's API on a large stream without
worrying about loading a huge amount of data into memory if there is no
newline delimiter.
The functions of this trait are centered around
BufRead::with_utf8_chunk
, which takes a closure being
passed the string slice of utf8 data read from the inner reader, and returns
an io::Result
of the number of bytes read, in the same same
fashion as most functions from std::io
's traits and structs functions.
The string slice may be of arbitrary length and may stop at any point in the
stream, but will always contain valid UTF-8.
The trait also provides functions to append to a provided buffer and to iterate over read chunks.
use BufRead;
use BufReader;
use File;
If valid utf-8 codepoint is read it will always be processed, be it passed
to a closure or appended to provided buffer. If an invalid or incomplete
codepoint is read, the functions of this crate will first process all the
valid bytes read and a relevant io::Error
will be returned on the
next call:
Work in progress
This crate is fairly new, and for now only provides a limited amount API, with a rather simple implementation. In the near future these features should be added:
- A lossy and unchecked version of
read_utf8
(seefrom_utf8_lossy
&from_utf8_unchecked
). - A
char
s iterator from the buffer, and its lossy version. - I'm open to suggestion, if you have ideas 😉
This also means it may have a pretty unstable API
Given I'm not the most experience developer at all, you are very welcome to submit push requests here
License
Utf8-BufRead is distributed under the terms of the Apache License 2.0, see the LICENSE file in the root directory of this repository.