Hash a file by path. Uses I/O pipelining for large files on Linux,
mmap with HUGEPAGE hints as fallback, single-read for small files,
and streaming read for non-regular files.
Hash a file without fstat — just open, read until EOF, hash.
For many-file workloads (100+ tiny files), skipping fstat saves ~5µs/file.
Uses a two-tier buffer strategy: small stack buffer (4KB) for the initial read,
then falls back to a larger stack buffer (64KB) or streaming hash for bigger files.
For benchmark’s 55-byte files: one read() fills the 4KB buffer, hash immediately.
Batch-hash multiple files with SHA-256/MD5 using work-stealing parallelism.
Files are sorted by size (largest first) so the biggest files start processing
immediately. Each worker thread grabs the next unprocessed file via atomic index,
eliminating tail latency from uneven file sizes.
Returns results in input order.
Parse a BSD-style tag line: “ALGO (filename) = hash”
Returns (expected_hash, filename, optional_bits).
bits is the hash length parsed from the algo name (e.g., BLAKE2b-256 -> Some(256)).
Issue readahead hints for a list of file paths to warm the page cache.
Uses POSIX_FADV_WILLNEED which is non-blocking and batches efficiently.
Only issues hints for files >= 1MB; small files are read fast enough
that the fadvise syscall overhead isn’t worth it.
Check if parallel hashing is worthwhile for the given file paths.
Always parallelize with 2+ files — rayon’s thread pool is lazily initialized
once and reused, so per-file work-stealing overhead is negligible (~1µs).
Removing the stat()-based size check eliminates N extra syscalls for N files.
Build and write the standard GNU hash output line in a single write() call.
Format: “hash filename\n” or “hash *filename\n” (binary mode).
For escaped filenames: “\hash escaped_filename\n”.