pub fn is_zero_chunk(chunk: &[u8]) -> boolExpand description
Checks if a chunk consists entirely of zero bytes.
This function efficiently detects all-zero chunks to enable sparse block optimization. Zero chunks are common in VM images (unallocated space), memory dumps (zero-initialized pages), and sparse files.
§Algorithm
Uses Rust’s iterator all() combinator, which:
- Short-circuits on first non-zero byte (early exit)
- Compiles to SIMD instructions on modern CPUs (autovectorization)
- Typically processes 16-32 bytes per instruction (AVX2/AVX-512)
§Parameters
chunk: Byte slice to check- Empty slices return
true(vacuous truth) - Typical size: 16 KiB - 256 KiB (configurable block size)
- Empty slices return
§Returns
true: All bytes are zero (sparse block, usecreate_zero_block)false: At least one non-zero byte (normal block, compress and write)
§Performance
Modern CPUs with SIMD support achieve excellent throughput:
- SIMD-optimized: ~10-20 GB/s (memory bandwidth limited)
- Scalar fallback: ~1-2 GB/s (without SIMD)
- Typical overhead: <1% of total packing time
The check is always worth performing given the massive space savings for zero blocks.
§Examples
§Basic Usage
use hexz_core::ops::write::is_zero_chunk;
let zeros = vec![0u8; 65536];
assert!(is_zero_chunk(&zeros));
let data = vec![0u8, 1u8, 0u8];
assert!(!is_zero_chunk(&data));
let empty: &[u8] = &[];
assert!(is_zero_chunk(empty)); // Empty is considered "all zeros"§Packing Loop Integration
for (idx, chunk) in chunks.iter().enumerate() {
let info = if is_zero_chunk(chunk) {
// Fast path: No compression, no write, just metadata
create_zero_block(chunk.len() as u32)
} else {
// Slow path: Compress, write, create metadata
write_block(&mut out, chunk, idx as u64, &mut offset, None::<&mut StandardHashTable>, &compressor, None, &hasher, &mut hash_buf, &mut compress_buf, &mut encrypt_buf)?
};
index_blocks.push(info);
}§Benchmarking Zero Detection
use hexz_core::ops::write::is_zero_chunk;
use std::time::Instant;
let chunk = vec![0u8; 64 * 1024 * 1024]; // 64 MiB
let start = Instant::now();
for _ in 0..100 {
let _ = is_zero_chunk(&chunk);
}
let elapsed = start.elapsed();
let throughput = (64.0 * 100.0) / elapsed.as_secs_f64(); // MB/s
println!("Zero detection: {:.1} GB/s", throughput / 1024.0);§SIMD Optimization
On x86-64 with AVX2, the compiler typically generates code like:
vpxor ymm0, ymm0, ymm0 ; Zero register
loop:
vmovdqu ymm1, [rsi] ; Load 32 bytes
vpcmpeqb ymm2, ymm1, ymm0 ; Compare with zero
vpmovmskb eax, ymm2 ; Extract comparison mask
cmp eax, 0xFFFFFFFF ; All zeros?
jne found_nonzero ; Early exit if not
add rsi, 32 ; Advance pointer
loopThis processes 32 bytes per iteration (~1-2 cycles on modern CPUs).
§Edge Cases
- Empty chunks: Return
true(vacuous truth, no non-zero bytes) - Single byte: Works correctly, no special handling needed
- Unaligned chunks: SIMD code handles unaligned loads transparently
§Alternative Implementations
Other possible implementations (not currently used):
- Manual SIMD: Use
std::archfor explicit SIMD (faster but less portable) - Chunked comparison: Process in 8-byte chunks with
u64casts (faster scalar) - Bitmap scan: Use CPU’s
bsf/tzcntto skip zero regions (complex)
Current implementation relies on compiler autovectorization, which works well in practice and maintains portability.
§Correctness
This function is pure and infallible:
- No side effects (read-only operation)
- No panics (iterator
all()is safe for all inputs) - No undefined behavior (all byte patterns are valid)