Module hash

Source

Structs§

CheckOptions: Options for check mode.
CheckResult: Result of check mode verification.

Enums§

HashAlgorithm: Supported hash algorithms.

Functions§

blake2b_hash_data: Hash raw data with BLAKE2b variable output length. output_bytes is the output size in bytes (e.g., 32 for 256-bit).
blake2b_hash_file: Hash a file with BLAKE2b variable output length. Uses mmap for large files (zero-copy), single-read for small files, and streaming read as fallback.
blake2b_hash_files_many: Batch-hash multiple files with BLAKE2b using multi-buffer SIMD.
blake2b_hash_files_parallel: Batch-hash multiple files with BLAKE2b using the best strategy for the workload. Samples a few files to estimate total data size. For small workloads, uses single-core SIMD batch hashing (blake2b_hash_files_many) to avoid stat and thread spawn overhead. For larger workloads, uses multi-core work-stealing parallelism where each worker calls blake2b_hash_file (with I/O pipelining for large files on Linux). Returns results in input order.
blake2b_hash_reader: Hash a reader with BLAKE2b variable output length. Uses thread-local buffer for cache-friendly streaming.
blake2b_hash_stdin: Hash stdin with BLAKE2b variable output length. Tries fadvise if stdin is a regular file (shell redirect), then streams.
check_file: Verify checksums from a check file. Each line should be “hash filename” or “hash *filename” or “ALGO (filename) = hash”.
hash_bytes: Compute hash of a byte slice directly (zero-copy fast path).
hash_bytes_to_buf: Hash data and write hex result directly into an output buffer. Returns the number of hex bytes written. Avoids String allocation on the critical single-file fast path. out must be at least 128 bytes for BLAKE2b (64 * 2), 64 for SHA256, 32 for MD5.
hash_file: Hash a file by path. Uses I/O pipelining for large files on Linux, mmap with HUGEPAGE hints as fallback, single-read for small files, and streaming read for non-regular files.
hash_file_nostat: Hash a file without fstat — just open, read until EOF, hash. For many-file workloads (100+ tiny files), skipping fstat saves ~5µs/file. Uses a two-tier buffer strategy: small stack buffer (4KB) for the initial read, then falls back to a larger stack buffer (64KB) or streaming hash for bigger files. For benchmark’s 55-byte files: one read() fills the 4KB buffer, hash immediately.
hash_file_raw: Hash a single file using raw Linux syscalls for minimum overhead. Bypasses Rust’s File abstraction entirely: raw open/fstat/read/close. For the single-file fast path, this eliminates OpenOptions builder, CString heap allocation, File wrapper overhead, and Read trait dispatch.
hash_file_raw_to_buf: Hash a single file using raw syscalls and write hex directly to output buffer. Returns number of hex bytes written. This is the absolute minimum-overhead path for single-file hashing: raw open + fstat + read + hash + hex encode, with zero String allocation.
hash_files_batch: Batch-hash multiple files: pre-read all files into memory in parallel, then hash all data in parallel. Optimal for many small files where per-file overhead (open/read/close syscalls) dominates over hash computation.
hash_files_parallel: Batch-hash multiple files with SHA-256/MD5 using work-stealing parallelism. Files are sorted by size (largest first) so the biggest files start processing immediately. Each worker thread grabs the next unprocessed file via atomic index, eliminating tail latency from uneven file sizes. Returns results in input order.
hash_files_parallel_fast: Fast parallel hash for multi-file workloads. Skips the stat-all-and-sort phase of hash_files_parallel() and uses hash_file_nostat() per worker to minimize per-file syscall overhead. For 100 tiny files, this eliminates ~200 stat() calls (100 from the sort phase + 100 from open_and_stat inside each worker). Returns results in input order.
hash_reader: Compute hash of data from a reader, returning hex string.
hash_stdin: Hash stdin. Uses fadvise for file redirects, streaming for pipes.
parse_check_line: Parse a checksum line in any supported format.
parse_check_line_tag: Parse a BSD-style tag line: “ALGO (filename) = hash” Returns (expected_hash, filename, optional_bits). bits is the hash length parsed from the algo name (e.g., BLAKE2b-256 -> Some(256)).
print_hash: Print hash result in GNU format: “hash filename\n” Uses raw byte writes to avoid std::fmt overhead.
print_hash_tag: Print hash result in BSD tag format: “ALGO (filename) = hash\n”
print_hash_tag_b2sum: Print hash in BSD tag format with BLAKE2b length info: “BLAKE2b (filename) = hash” for 512-bit, or “BLAKE2b-256 (filename) = hash” for other lengths.
print_hash_tag_b2sum_zero: Print hash in BSD tag format with BLAKE2b length info and NUL terminator.
print_hash_tag_zero: Print hash in BSD tag format with NUL terminator.
print_hash_zero: Print hash in GNU format with NUL terminator instead of newline.
readahead_files: Issue readahead hints for a list of file paths to warm the page cache. Uses POSIX_FADV_WILLNEED which is non-blocking and batches efficiently. Only issues hints for files >= 1MB; small files are read fast enough that the fadvise syscall overhead isn’t worth it.
readahead_files_all: Issue readahead hints for ALL file paths (no size threshold). For multi-file benchmarks, even small files benefit from batched readahead.
should_use_parallel: Check if parallel hashing is worthwhile for the given file paths. Always parallelize with 2+ files — rayon’s thread pool is lazily initialized once and reused, so per-file work-stealing overhead is negligible (~1µs). Removing the stat()-based size check eliminates N extra syscalls for N files.
write_hash_line: Build and write the standard GNU hash output line in a single write() call. Format: “hash filename\n” or “hash *filename\n” (binary mode). For escaped filenames: “\hash escaped_filename\n”.
write_hash_tag_line: Build and write BSD tag format output in a single write() call. Format: “ALGO (filename) = hash\n”

Module hash

Module hash Copy item path

Structs§

Enums§

Functions§

Module hash