bufhash/lib.rs
1//! Buffered hashing facilities.
2//!
3//! This crate provides facilities for writing hashers that take advantage of buffering
4//! in various ways. Most notably, it provides the concept of **partitioned** hashers,
5//! which are a class of hashers that split data into fixed-sized chunks before hashing.
6//! This crate, and all of the facilities therein, are written in 100% Rust and are
7//! designed to integrate with the existing Rust traits, such as [`std::hash::Hasher`].
8//!
9//! ## Overview
10//!
11//! Partitioned hashers are a class of hashers that work on fixed-size chunks of data
12//! (for example, [MurmurHash3]). Fundamentally, the facilities provided under the
13//! [`partitioned`] namespace exist to solve the problem of accounting while
14//! partitioning data during incremental hashing. Underneath the hood, partitioned
15//! hashers keep track of chunking streaming inputs into the appropriate sizes and
16//! storing excess data within an internal buffer to be used in the next
17//! [`write()`](std::hash::Hasher::write) call.
18//!
19//! For example, consider the following invocation of a fictional hashing algorithm that
20//! works in chunks of four (4) bytes at a time:
21//!
22//! ```ignore
23//! let mut hasher = MyHasher::default();
24//!
25//! hasher.write(b"Hi,");
26//! hasher.write(b"there ");
27//! hasher.write(b"world!");
28//!
29//! println!("Result: 0x{:X}", hasher.finish());
30//! ```
31//!
32//! In this example, the data being hashed is incrementally fed to the hasher over three
33//! different [`write()`](std::hash::Hasher::write) calls. Recalling that our hasher
34//! works on four byte chunks, the first three characters in the byte string literal
35//! (i.e., `b"Hi,"`) are not sufficient to fill out a full block of data to be hashed.
36//! However, we similarly do not want to treat these three characters as if they are the
37//! end of the data stream, as more data may be coming (and, in fact, _is_ in our
38//! example. With the facilities provided by the Rust standard library, it is up to the
39//! implementer to keep track of these various edge cases and ensure the hasher works
40//! appropriately.
41//!
42//! To ease implementation of hashers that fit the aforementioned profile, [partitioned
43//! hashers](crate::partitioned::Hasher) take care of the necessary accounting behind
44//! the scenes and allow the implementer to simply focus on a fixed-size interface for
45//! hashing.
46//!
47//! ## Implementing a Partitioned Hasher
48//!
49//! Let's look at the core [`bufhash::partitioned::Hasher`](crate::partitioned::Hasher)
50//! trait to ascertain how writing partitioned hashers is different than writing an
51//! implementation for a [`std::hash::Hasher`].
52//!
53//! ```
54//! pub trait Hasher<const S: usize> {
55//! fn write(&mut self, bytes: &[u8; S]);
56//! fn finish(&self, bytes: &[u8]) -> u64;
57//! }
58//! ```
59//!
60//! You'll find the same two methods as provided in [`std::hash::Hash`] with a few,
61//! minor (but important!) differences:
62//!
63//! * The trait has a generic parameter `const S: usize`, which defines the number of
64//! bytes per partition. In the case of our example above, we would set this to four
65//! (4) bytes.
66//! * The [`write()`](crate::partitioned::Hasher::write) method takes a fixed-size array
67//! of bytes (`&[u8; S]`). The `bytes` parameter is _guaranteed_ to be exactly the
68//! size of the generic parameter, allowing the implementer to forgo size checking.
69//! * The [`finish()`](crate::partitioned::Hasher::finish) method also provides a
70//! `bytes` parameter as a byte slice. This parameter are the bytes left over in the
71//! internal buffer that need to be handled before the hasher is finished: it is
72//! guaranteed to have a size less than the generic parameter.
73//!
74//! As an example, let's implement a partitioned hasher that (a) interprets the byte
75//! stream as little-endian `u64`s that must be added together and then (b) shifts the
76//! result left by the number of bytes remaining in the buffer when `finish()` is called
77//! (not a particularly good hashing algorithm, but it will work for our demonstration
78//! purposes).
79//!
80//! ```
81//! #[derive(Debug, Default)]
82//! pub struct Simple(u64);
83//!
84//! impl bufhash::partitioned::Hasher<8> for Simple {
85//! fn write(&mut self, bytes: &[u8; 8]) {
86//! let data = u64::from_le_bytes(*bytes);
87//! self.0 = self.0.wrapping_add(data);
88//! }
89//!
90//! fn finish(&self, bytes: &[u8]) -> u64 {
91//! self.0 << bytes.len()
92//! }
93//! }
94//! ```
95//!
96//! As you can see, this is much more straightforward to implement and maintain that
97//! handling the accounting yourself.
98//!
99//! To use this new implementation with the [`std::hash::Hasher`] interface, you can use
100//! the [`PartitionedHasher`] adapter like so:
101//!
102//! ```
103//! use std::hash::Hasher as _;
104//! # #[derive(Debug, Default)]
105//! # pub struct Simple(u64);
106//! #
107//! # impl bufhash::partitioned::Hasher<8> for Simple {
108//! # fn write(&mut self, bytes: &[u8; 8]) {
109//! # let data = u64::from_le_bytes(*bytes);
110//! # self.0 = self.0.wrapping_add(data);
111//! # }
112//# !
113//! # fn finish(&self, bytes: &[u8]) -> u64 {
114//! # self.0 << bytes.len()
115//! # }
116//! # }
117//!
118//! let mut hasher = bufhash::PartitionedHasher::new(Simple::default());
119//! hasher.write(b"Hello, world!");
120//!
121//! println!("Result: {:#X}", hasher.finish());
122//! ```
123//!
124//! [MurmurHash3]: https://en.wikipedia.org/wiki/MurmurHash#MurmurHash3
125
126#![warn(missing_docs)]
127#![warn(rust_2018_idioms)]
128#![warn(rust_2021_compatibility)]
129#![warn(missing_debug_implementations)]
130#![warn(rustdoc::broken_intra_doc_links)]
131
132pub mod partitioned;
133
134pub use partitioned::PartitionedHasher;