Expand description
§buf_read_write
This crate contains the BufStream struct, a combination of std::io::BufReader
and std::io::BufWriter.
§Motivation
When reading or writing files in rust, it’s absolutely essential to wrap std::fs::File
in BufReader or BufWriter. Failure to do this can cause poor performance, at least if
data is written in small chunks. This is because each individual write becomes an operating
system call.
Sone applications need to both read and write to the same file. Unfortunately, BufReader
only supports reading, and BufWriter only supports writing. The two cannot be easily
combined.
This crate attempts to resolve this, by introducing a BufStream construct that allows
both buffered reading and writing.
§Design decisions
The following design decisions have been made for this crate:
-
It requires the underlying object to implement
std::io::Read,std::io::Write, andstd::io::Seek. The motivation for this is that reading and writing to the same file is mostly only useful together with seeking, and requiring this simplifies the design. -
It shares the buffer between both reading and writing. This means that reads of data that has just previously been written will be satisfied directly from the buffer. It also means that writing one place in the file, then moving to a different place and reading, will invalidate the buffer (writing it back correctly to the backing implementation).
-
buf_read_write is not a disk cache. Reads and writes larger than the buffer size will be satisfied by bypassing the buffer. The purpose of buf_read_write is only to provide acceptable performance when doing small reads/writes.
-
Buffered reads assume the file is being traversed forward. Reading position 2000 with a buffer size of 1000, will result in a call to the backing implementation of bytes
2000..3000. -
All writes behave like
std::io::Write::write_all. This simplifies the implementation, and is often what you want for disk io (the main use case for this library). (std::io::BufWriteralso effectively does this when it is flushing its IO buffer). -
Seeks are not always immediately passed on to the backing implementation. Instead, before each read, a seek is issued if required. This makes sense, since when the buffer needs to be flushed, extra seeks might otherwise be needed. NOTE! SeekFrom::End() does cause a flush and an immediate call to the backing implementation. This is due to the need for seeking to determine the end of the stream.
-
This crate does not attempt to support files larger than 2^64 bytes. Seeking directly this far is always impossible because of type ranges. But this crate additionally does not support writing beyond the end of this limit, even if no seeks occur. Because of how large 2^64 is, this is unlikely to be a problem in practice.
§Implementation
-
An extensive test suite exists, including automatic chaos testing, exhaustive testing for simple cases, and ‘cargo mutants’-testing.
-
No unsafe code is used
-
buf_read_write has no dependencies (apart from dev-dependencies)
-
Note that when mixing writes, reads and seeks, the buffer will be reused. The dirty region of the buffer is tracked using a simple range. A consequence of this is that if a large chunk is read, and a single byte is modified at the head and tail of this chunk, when the buffer is flushed, the entire buffer will be written to the backing implementation. For disk IO, this can be acceptable, since writing a whole buffer may be equally fast as writing two smaller buffers. If this behavior is not desired, consider flushing the buffer between such writes.
Structs§
- BufStream
- Buffering reader/writer.