Skip to main content

write_block

Function write_block 

Source
pub fn write_block<W: Write>(
    out: &mut W,
    chunk: &[u8],
    block_idx: u64,
    current_offset: &mut u64,
    dedup_map: Option<&mut StandardHashTable>,
    compressor: &dyn Compressor,
    encryptor: Option<&dyn Encryptor>,
    hasher: &dyn ContentHasher,
    hash_buf: &mut [u8; 32],
    compress_buf: &mut Vec<u8>,
    encrypt_buf: &mut Vec<u8>,
) -> Result<BlockInfo>
Expand description

Writes a compressed and optionally encrypted block to the output stream.

This function implements the complete block transformation pipeline: compression, optional encryption, checksum computation, deduplication, and physical write. It returns a BlockInfo descriptor suitable for inclusion in an index page.

§Transformation Pipeline

  1. Compression: Compress raw chunk using provided compressor (LZ4 or Zstd)
  2. Encryption (optional): Encrypt compressed data with AES-256-GCM using block_idx as nonce
  3. Checksum: Compute CRC32 of final data for integrity verification
  4. Deduplication (optional, not for encrypted):
    • Compute BLAKE3 hash of final data
    • Check dedup_map for existing block with same hash
    • If found: Reuse existing offset, skip write
    • If new: Write block, record offset in dedup_map
  5. Write: Append final data to output at current_offset
  6. Metadata: Create and return BlockInfo with offset, length, checksum

§Parameters

  • out: Output writer implementing Write trait

    • Typically a File or BufWriter<File>
    • Must support write_all for atomic block writes
  • chunk: Uncompressed chunk data (raw bytes)

    • Typical size: 16 KiB - 256 KiB (configurable)
    • Must not be empty (undefined behavior for zero-length chunks)
  • block_idx: Global block index (zero-based)

    • Used as encryption nonce (must be unique per snapshot)
    • Monotonically increases across all streams
    • Must not reuse indices within same encrypted snapshot (breaks security)
  • current_offset: Mutable reference to current physical file offset

    • Updated after successful write: *current_offset += bytes_written
    • Not updated on error (file state undefined)
    • Not updated for deduplicated blocks (reuses existing offset)
  • dedup_map: Optional deduplication hash table

    • Some(&mut map): Enable dedup, use this map
    • None: Disable dedup, always write
    • Ignored if encryptor.is_some() (encryption prevents dedup)
    • Maps BLAKE3 hash → physical offset of first occurrence
  • compressor: Compression algorithm implementation

    • Typically Lz4Compressor or ZstdCompressor
    • Must implement Compressor trait
  • encryptor: Optional encryption implementation

    • Some(enc): Encrypt compressed data with AES-256-GCM
    • None: Store compressed data unencrypted
    • Must implement Encryptor trait
  • hasher: Content hasher for deduplication

    • Typically Blake3Hasher
    • Must implement ContentHasher trait
    • Used only when dedup_map is Some and encryptor is None
  • hash_buf: Reusable buffer for hash output (must be ≥32 bytes)

    • Avoids allocation on every hash computation
    • Only used when dedup is enabled

§Returns

  • Ok(BlockInfo): Block written successfully, metadata returned

    • offset: Physical byte offset where block starts
    • length: Compressed (and encrypted) size in bytes
    • logical_len: Original uncompressed size
    • checksum: CRC32 of final data (compressed + encrypted)
  • Err(Error::Io): I/O error during write

    • Disk full, permission denied, device error
    • File state undefined (partial write may have occurred)
  • Err(Error::Compression): Compression failed

    • Rare; usually indicates library bug or corrupted input
  • Err(Error::Encryption): Encryption failed

    • Rare; usually indicates crypto library bug

§Examples

§Basic Usage (No Encryption, No Dedup)

use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;

let mut out = File::create("output.hxz")?;
let mut offset = 512u64; // After header
let chunk = vec![0x42; 65536]; // 64 KiB of data
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];

let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();

let info = write_block(
    &mut out,
    &chunk,
    0,              // block_idx
    &mut offset,
    None::<&mut StandardHashTable>, // No dedup
    &compressor,
    None,           // No encryption
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;

println!("Block written at offset {}, size {}", info.offset, info.length);

§With Deduplication

use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;

let mut out = File::create("output.hxz")?;
let mut offset = 512u64;
let mut dedup_map = StandardHashTable::new();
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];
let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();

// Write first block
let chunk1 = vec![0xAA; 65536];
let info1 = write_block(
    &mut out,
    &chunk1,
    0,
    &mut offset,
    Some(&mut dedup_map),
    &compressor,
    None,
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;
println!("Block 0: offset={}, written", info1.offset);

// Write duplicate block (same content)
let chunk2 = vec![0xAA; 65536];
let info2 = write_block(
    &mut out,
    &chunk2,
    1,
    &mut offset,
    Some(&mut dedup_map),
    &compressor,
    None,
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;
println!("Block 1: offset={}, deduplicated (no write)", info2.offset);
assert_eq!(info1.offset, info2.offset); // Same offset, block reused

§With Encryption

use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::encryption::AesGcmEncryptor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_common::crypto::KeyDerivationParams;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;

let mut out = File::create("output.hxz")?;
let mut offset = 512u64;
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];

// Initialize encryptor
let params = KeyDerivationParams::default();
let encryptor = AesGcmEncryptor::new(
    b"strong_password",
    &params.salt,
    params.iterations,
)?;

let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();

let chunk = vec![0x42; 65536];
let info = write_block(
    &mut out,
    &chunk,
    0,
    &mut offset,
    None::<&mut StandardHashTable>, // Dedup disabled (encryption prevents it)
    &compressor,
    Some(&encryptor),
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;

println!("Encrypted block: offset={}, length={}", info.offset, info.length);

§Performance

  • Compression: Dominates runtime (~2 GB/s LZ4, ~500 MB/s Zstd)
  • Encryption: ~1-2 GB/s (hardware AES-NI)
  • Hashing: ~3200 MB/s (BLAKE3 for dedup)
  • I/O: Typically not bottleneck (buffered writes, ~3 GB/s sequential)

§Deduplication Effectiveness

Deduplication is most effective when:

  • Fixed-size blocks: Same content → same boundaries → same hash
  • Unencrypted: Encryption produces unique ciphertext per block (different nonces)
  • Redundant data: Duplicate files, repeated patterns, copy-on-write filesystems

Deduplication is ineffective when:

  • Content-defined chunking: Small shifts cause different boundaries
  • Compressed input: Pre-compressed data has low redundancy
  • Unique data: No duplicate blocks to detect

§Security Considerations

§Block Index as Nonce

When encrypting, block_idx is used as part of the AES-GCM nonce. CRITICAL:

  • Never reuse block_idx values within the same encrypted snapshot
  • Nonce reuse breaks AES-GCM security (allows plaintext recovery)
  • Each logical block must have a unique index

§Deduplication and Encryption

Deduplication is automatically disabled when encrypting because:

  • Each block has a unique nonce → unique ciphertext
  • BLAKE3(ciphertext1) ≠ BLAKE3(ciphertext2) even if plaintext is identical
  • Attempting dedup with encryption wastes CPU (hashing) without space savings

§Thread Safety

This function is not thread-safe with respect to the output writer:

  • Concurrent calls with the same out writer will interleave writes (corruption)
  • Concurrent calls with different writers to the same file will corrupt file

For parallel writing, use separate output files or implement external synchronization.

The dedup_map must also be externally synchronized for concurrent access.