Skip to main content

create_zero_block

Function create_zero_block 

Source
pub fn create_zero_block(logical_len: u32) -> BlockInfo
Expand description

Creates a zero-block descriptor without writing data to disk.

Zero blocks (all-zero chunks) are a special case optimized for space efficiency. Instead of compressing and storing zeros, we create a metadata-only descriptor that signals to the reader to return zeros without performing any I/O.

§Sparse Data Optimization

Many VM disk images and memory dumps contain large regions of zeros:

  • Unallocated disk space: File systems often zero-initialize blocks
  • Memory pages: Unused or zero-initialized memory
  • Sparse files: Holes in sparse file systems

Storing these zeros (even compressed) wastes space:

  • LZ4-compressed zeros: ~100 bytes per 64 KiB block (~0.15% of original)
  • Uncompressed zeros: 64 KiB per block (100%)
  • Metadata-only: 20 bytes per block (~0.03%)

The metadata approach saves 99.97% of space for zero blocks.

§Descriptor Format

Zero blocks are identified by a special BlockInfo signature:

  • offset = 0: Invalid physical offset (data region starts at ≥512)
  • length = 0: No physical storage
  • logical_len = N: Original zero block size in bytes
  • checksum = 0: No checksum needed (zeros are deterministic)

Readers recognize this pattern and synthesize zeros without I/O.

§Parameters

  • logical_len: Size of the zero block in bytes
    • Typically matches block_size (e.g., 65536 for 64 KiB blocks)
    • Can vary with content-defined chunking
    • Must be > 0 (zero-length blocks are invalid)

§Returns

BlockInfo descriptor with zero-block semantics:

  • offset = 0
  • length = 0
  • logical_len = logical_len
  • checksum = 0

§Examples

§Detecting and Creating Zero Blocks

use hexz_core::ops::write::{is_zero_chunk, create_zero_block};
use hexz_core::format::index::BlockInfo;

let chunk = vec![0u8; 65536]; // 64 KiB of zeros

if is_zero_chunk(&chunk) {
    let info = create_zero_block(chunk.len() as u32);
    assert_eq!(info.offset, 0);
    assert_eq!(info.length, 0);
    assert_eq!(info.logical_len, 65536);
    println!("Zero block: No storage required!");
}

§Usage in Packing Loop

for (idx, chunk) in chunks.iter().enumerate() {
    let info = if is_zero_chunk(chunk) {
        // Optimize: No compression, no write
        create_zero_block(chunk.len() as u32)
    } else {
        // Normal path: Compress and write
        write_block(&mut out, chunk, idx as u64, &mut offset, None::<&mut StandardHashTable>, &compressor, None, &hasher, &mut hash_buf, &mut compress_buf, &mut encrypt_buf)?
    };
    // Add info to index page...
}

§Performance

  • Time complexity: O(1) (no I/O, no computation)
  • Space complexity: O(1) (fixed-size struct)
  • Typical savings: 99.97% vs. compressed zeros

§Reader Behavior

When a reader encounters a zero block (offset=0, length=0):

  1. Recognize zero-block pattern from metadata
  2. Allocate buffer of size logical_len
  3. Fill buffer with zeros (optimized memset)
  4. Return buffer to caller

No decompression, decryption, or checksum verification is performed.

§Interaction with Deduplication

Zero blocks do not participate in deduplication:

  • They are never written to disk → no physical offset → no dedup entry
  • Each zero block gets its own metadata descriptor
  • This is fine: Metadata is cheap (20 bytes), and all zero blocks have same content

§Interaction with Encryption

Zero blocks work correctly with encryption:

  • They are detected before compression/encryption
  • Encrypted snapshots still use zero-block optimization
  • Readers synthesize zeros without decryption

This is safe because zeros are public information (no confidentiality lost).

§Validation

IMPORTANT: This function does NOT validate that the original chunk was actually all zeros. The caller is responsible for calling is_zero_chunk first.

If a non-zero chunk is incorrectly marked as a zero block, readers will return zeros instead of the original data (silent data corruption).