Struct blake3::Hasher

source ·
pub struct Hasher { /* private fields */ }
Expand description

An incremental hash state that can accept any number of writes.

The rayon and mmap Cargo features enable additional methods on this type related to multithreading and memory-mapped IO.

When the traits-preview Cargo feature is enabled, this type implements several commonly used traits from the digest crate. However, those traits aren’t stable, and they’re expected to change in incompatible ways before that crate reaches 1.0. For that reason, this crate makes no SemVer guarantees for this feature, and callers who use it should expect breaking changes between patch versions.

§Examples

// Hash an input incrementally.
let mut hasher = blake3::Hasher::new();
hasher.update(b"foo");
hasher.update(b"bar");
hasher.update(b"baz");
assert_eq!(hasher.finalize(), blake3::hash(b"foobarbaz"));

// Extended output. OutputReader also implements Read and Seek.
let mut output = [0; 1000];
let mut output_reader = hasher.finalize_xof();
output_reader.fill(&mut output);
assert_eq!(&output[..32], blake3::hash(b"foobarbaz").as_bytes());

Implementations§

source§

impl Hasher

source

pub fn new() -> Self

Construct a new Hasher for the regular hash function.

source

pub fn new_keyed(key: &[u8; 32]) -> Self

Construct a new Hasher for the keyed hash function. See keyed_hash.

source

pub fn new_derive_key(context: &str) -> Self

Construct a new Hasher for the key derivation function. See derive_key. The context string should be hardcoded, globally unique, and application-specific.

source

pub fn reset(&mut self) -> &mut Self

Reset the Hasher to its initial state.

This is functionally the same as overwriting the Hasher with a new one, using the same key or context string if any.

source

pub fn update(&mut self, input: &[u8]) -> &mut Self

Add input bytes to the hash state. You can call this any number of times.

This method is always single-threaded. For multithreading support, see update_rayon (enabled with the rayon Cargo feature).

Note that the degree of SIMD parallelism that update can use is limited by the size of this input buffer. See update_reader.

source

pub fn finalize(&self) -> Hash

Finalize the hash state and return the Hash of the input.

This method is idempotent. Calling it twice will give the same result. You can also add more input and finalize again.

source

pub fn finalize_xof(&self) -> OutputReader

Finalize the hash state and return an OutputReader, which can supply any number of output bytes.

This method is idempotent. Calling it twice will give the same result. You can also add more input and finalize again.

source

pub fn count(&self) -> u64

Return the total number of bytes hashed so far.

source

pub fn update_reader(&mut self, reader: impl Read) -> Result<&mut Self>

As update, but reading from a std::io::Read implementation.

Hasher implements std::io::Write, so it’s possible to use std::io::copy to update a Hasher from any reader. Unfortunately, this standard approach can limit performance, because copy currently uses an internal 8 KiB buffer that isn’t big enough to take advantage of all SIMD instruction sets. (In particular, AVX-512 needs a 16 KiB buffer.) update_reader avoids this performance problem and is slightly more convenient.

The internal buffer size this method uses may change at any time, and it may be different for different targets. The only guarantee is that it will be large enough for all of this crate’s SIMD implementations on the current platform.

The most common implementer of std::io::Read might be std::fs::File, but note that memory mapping can be faster than this method for hashing large files. See update_mmap and update_mmap_rayon, which require the mmap and (for the latter) rayon Cargo features.

This method requires the std Cargo feature, which is enabled by default.

§Example
// Hash standard input.
let mut hasher = blake3::Hasher::new();
hasher.update_reader(std::io::stdin().lock())?;
println!("{}", hasher.finalize());
source

pub fn update_rayon(&mut self, input: &[u8]) -> &mut Self

As update, but using Rayon-based multithreading internally.

This method is gated by the rayon Cargo feature, which is disabled by default but enabled on docs.rs.

To get any performance benefit from multithreading, the input buffer needs to be large. As a rule of thumb on x86_64, update_rayon is slower than update for inputs under 128 KiB. That threshold varies quite a lot across different processors, and it’s important to benchmark your specific use case. See also the performance warning associated with update_mmap_rayon.

If you already have a large buffer in memory, and you want to hash it with multiple threads, this method is a good option. However, reading a file into memory just to call this method can be a performance mistake, both because it requires lots of memory and because single-threaded reads can be slow. For hashing whole files, see update_mmap_rayon, which is gated by both the rayon and mmap Cargo features.

source

pub fn update_mmap(&mut self, path: impl AsRef<Path>) -> Result<&mut Self>

As update, but reading the contents of a file using memory mapping.

Not all files can be memory mapped, and memory mapping small files can be slower than reading them the usual way. In those cases, this method will fall back to standard file IO. The heuristic for whether to use memory mapping is currently very simple (file size >= 16 KiB), and it might change at any time.

Like update, this method is single-threaded. In this author’s experience, memory mapping improves single-threaded performance by ~10% for large files that are already in cache. This probably varies between platforms, and as always it’s a good idea to benchmark your own use case. In comparison, the multithreaded update_mmap_rayon method can have a much larger impact on performance.

There’s a correctness reason that this method takes Path instead of File: reading from a memory-mapped file ignores the seek position of the original file handle (it neither respects the current position nor updates the position). This difference in behavior would’ve caused update_mmap and update_reader to give different answers and have different side effects in some cases. Taking a Path avoids this problem by making it clear that a new File is opened internally.

This method requires the mmap Cargo feature, which is disabled by default but enabled on docs.rs.

§Example
let path = Path::new("file.dat");
let mut hasher = blake3::Hasher::new();
hasher.update_mmap(path)?;
println!("{}", hasher.finalize());
source

pub fn update_mmap_rayon(&mut self, path: impl AsRef<Path>) -> Result<&mut Self>

As update_rayon, but reading the contents of a file using memory mapping. This is the default behavior of b3sum.

For large files that are likely to be in cache, this can be much faster than single-threaded hashing. When benchmarks report that BLAKE3 is 10x or 20x faster than other cryptographic hashes, this is usually what they’re measuring. However…

Performance Warning: There are cases where multithreading hurts performance. The worst case is a large file on a spinning disk, where simultaneous reads from multiple threads can cause “thrashing” (i.e. the disk spends more time seeking around than reading data). Windows tends to be somewhat worse about this, in part because it’s less likely than Linux to keep very large files in cache. More generally, if your CPU cores are already busy, then multithreading will add overhead without improving performance. If your code runs in different environments that you don’t control and can’t measure, then unfortunately there’s no one-size-fits-all answer for whether multithreading is a good idea.

The memory mapping behavior of this function is the same as update_mmap, and the heuristic for when to fall back to standard file IO might change at any time.

This method requires both the mmap and rayon Cargo features, which are disabled by default but enabled on docs.rs.

§Example
let path = Path::new("big_file.dat");
let mut hasher = blake3::Hasher::new();
hasher.update_mmap_rayon(path)?;
println!("{}", hasher.finalize());

Trait Implementations§

source§

impl Clone for Hasher

source§

fn clone(&self) -> Hasher

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Hasher

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for Hasher

source§

fn default() -> Self

Returns the “default value” for a type. Read more
source§

impl Write for Hasher

source§

fn write(&mut self, input: &[u8]) -> Result<usize>

This is equivalent to update.

source§

fn flush(&mut self) -> Result<()>

Flush this output stream, ensuring that all intermediately buffered contents reach their destination. Read more
1.36.0 · source§

fn write_vectored(&mut self, bufs: &[IoSlice<'_>]) -> Result<usize, Error>

Like write, except that it writes from a slice of buffers. Read more
source§

fn is_write_vectored(&self) -> bool

🔬This is a nightly-only experimental API. (can_vector)
Determines if this Writer has an efficient write_vectored implementation. Read more
1.0.0 · source§

fn write_all(&mut self, buf: &[u8]) -> Result<(), Error>

Attempts to write an entire buffer into this writer. Read more
source§

fn write_all_vectored(&mut self, bufs: &mut [IoSlice<'_>]) -> Result<(), Error>

🔬This is a nightly-only experimental API. (write_all_vectored)
Attempts to write multiple buffers into this writer. Read more
1.0.0 · source§

fn write_fmt(&mut self, fmt: Arguments<'_>) -> Result<(), Error>

Writes a formatted string into this writer, returning any error encountered. Read more
1.0.0 · source§

fn by_ref(&mut self) -> &mut Self
where Self: Sized,

Creates a “by reference” adapter for this instance of Write. Read more
source§

impl Zeroize for Hasher

source§

fn zeroize(&mut self)

Zero out this object from memory using Rust intrinsics which ensure the zeroization operation is not “optimized away” by the compiler.

Auto Trait Implementations§

§

impl Freeze for Hasher

§

impl RefUnwindSafe for Hasher

§

impl Send for Hasher

§

impl Sync for Hasher

§

impl Unpin for Hasher

§

impl UnwindSafe for Hasher

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> Pointable for T

source§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.