UTF-8 Buffered Reader
Provides a read_utf8
function for all types implementing
BufRead
, allowing to read text file without worrying about
loading huge files without newline delimiters.
Usage
Add this crate as a dependency in your Cargo.toml
:
[]
= "0.1.1"
This will allow you to use the function read_utf8
on any object
implementing std::io::BufRead
. This function essentially reads a
stream and returns an UTF-8 String
:
use BufRead;
use BufReader;
assert_eq!;
A common issue encountered when using the standard Rust library to read large
files of text is that these may have extremely long lines or no newline
delimiters at all. This makes BufReader::read_line
or
BufReader::lines
load a large amount of data into memory, which may
not be desirable.
The function read_utf8
, on the other hand, will only read up until the
reader's buffer is full.
If valid utf-8 is read it will always be returned. If an invalid or
incomplete codepoint is read, the function will first return all the valid
bytes read and an InvalidData
error will be returned on the
next call:
use BufRead;
use ;
Work in progress
This crate is fairly new, and for now only provides the read_utf8
function,
with a rather simple implementation. In the near future these features should
be added:
- A lossy and unchecked version of
read_utf8
(seefrom_utf8_lossy
&from_utf8_unchecked
). - A
char
s iterator from the buffer, and its lossy version. - I'm open to suggestion, if you have ideas 😉
I am also looking for a way for read_utf8
to return a &str
instead of a
String
, meaning the reader is borrowed until the returned reference goes out
of scope, so that I let the user choose if they want to clone the data or not.
For the moment, the read codepoints are always cloned into a new String
.
Finally, I want to test and benchmark this crate.
Given I'm not the most experience developer at all, you are very welcome to submit push requests here
License
Utf8-BufRead is distributed under the terms of the Apache License 2.0, see the LICENSE file in the root directory of this repository.