Function write_block

Source

pub fn write_block<W: Write>(
    out: &mut W,
    chunk: &[u8],
    block_idx: u64,
    current_offset: &mut u64,
    dedup_map: Option<&mut StandardHashTable>,
    compressor: &dyn Compressor,
    encryptor: Option<&dyn Encryptor>,
    hasher: &dyn ContentHasher,
    hash_buf: &mut [u8; 32],
    compress_buf: &mut Vec<u8>,
    encrypt_buf: &mut Vec<u8>,
) -> Result<BlockInfo>

Expand description

Writes a compressed and optionally encrypted block to the output stream.

This function implements the complete block transformation pipeline: compression, optional encryption, checksum computation, deduplication, and physical write. It returns a BlockInfo descriptor suitable for inclusion in an index page.

§Transformation Pipeline

Compression: Compress raw chunk using provided compressor (LZ4 or Zstd)
Encryption (optional): Encrypt compressed data with AES-256-GCM using block_idx as nonce
Checksum: Compute CRC32 of final data for integrity verification
Deduplication (optional, not for encrypted):
- Compute BLAKE3 hash of final data
- Check dedup_map for existing block with same hash
- If found: Reuse existing offset, skip write
- If new: Write block, record offset in dedup_map
Write: Append final data to output at current_offset
Metadata: Create and return BlockInfo with offset, length, checksum

§Parameters

out: Output writer implementing Write trait
- Typically a File or BufWriter<File>
- Must support write_all for atomic block writes
chunk: Uncompressed chunk data (raw bytes)
- Typical size: 16 KiB - 256 KiB (configurable)
- Must not be empty (undefined behavior for zero-length chunks)
block_idx: Global block index (zero-based)
- Used as encryption nonce (must be unique per snapshot)
- Monotonically increases across all streams
- Must not reuse indices within same encrypted snapshot (breaks security)
current_offset: Mutable reference to current physical file offset
- Updated after successful write: *current_offset += bytes_written
- Not updated on error (file state undefined)
- Not updated for deduplicated blocks (reuses existing offset)
dedup_map: Optional deduplication hash table
- Some(&mut map): Enable dedup, use this map
- None: Disable dedup, always write
- Ignored if encryptor.is_some() (encryption prevents dedup)
- Maps BLAKE3 hash → physical offset of first occurrence
compressor: Compression algorithm implementation
- Typically Lz4Compressor or ZstdCompressor
- Must implement Compressor trait
encryptor: Optional encryption implementation
- Some(enc): Encrypt compressed data with AES-256-GCM
- None: Store compressed data unencrypted
- Must implement Encryptor trait
hasher: Content hasher for deduplication
- Typically Blake3Hasher
- Must implement ContentHasher trait
- Used only when dedup_map is Some and encryptor is None
hash_buf: Reusable buffer for hash output (must be ≥32 bytes)
- Avoids allocation on every hash computation
- Only used when dedup is enabled

§Returns

Ok(BlockInfo): Block written successfully, metadata returned
- offset: Physical byte offset where block starts
- length: Compressed (and encrypted) size in bytes
- logical_len: Original uncompressed size
- checksum: CRC32 of final data (compressed + encrypted)
Err(Error::Io): I/O error during write
- Disk full, permission denied, device error
- File state undefined (partial write may have occurred)
Err(Error::Compression): Compression failed
- Rare; usually indicates library bug or corrupted input
Err(Error::Encryption): Encryption failed
- Rare; usually indicates crypto library bug

§Examples

§Basic Usage (No Encryption, No Dedup)

use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;

let mut out = File::create("output.hxz")?;
let mut offset = 512u64; // After header
let chunk = vec![0x42; 65536]; // 64 KiB of data
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];

let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();

let info = write_block(
    &mut out,
    &chunk,
    0,              // block_idx
    &mut offset,
    None::<&mut StandardHashTable>, // No dedup
    &compressor,
    None,           // No encryption
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;

println!("Block written at offset {}, size {}", info.offset, info.length);

§With Deduplication

use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;

let mut out = File::create("output.hxz")?;
let mut offset = 512u64;
let mut dedup_map = StandardHashTable::new();
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];
let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();

// Write first block
let chunk1 = vec![0xAA; 65536];
let info1 = write_block(
    &mut out,
    &chunk1,
    0,
    &mut offset,
    Some(&mut dedup_map),
    &compressor,
    None,
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;
println!("Block 0: offset={}, written", info1.offset);

// Write duplicate block (same content)
let chunk2 = vec![0xAA; 65536];
let info2 = write_block(
    &mut out,
    &chunk2,
    1,
    &mut offset,
    Some(&mut dedup_map),
    &compressor,
    None,
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;
println!("Block 1: offset={}, deduplicated (no write)", info2.offset);
assert_eq!(info1.offset, info2.offset); // Same offset, block reused

§With Encryption

use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::encryption::AesGcmEncryptor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_common::crypto::KeyDerivationParams;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;

let mut out = File::create("output.hxz")?;
let mut offset = 512u64;
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];

// Initialize encryptor
let params = KeyDerivationParams::default();
let encryptor = AesGcmEncryptor::new(
    b"strong_password",
    &params.salt,
    params.iterations,
)?;

let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();

let chunk = vec![0x42; 65536];
let info = write_block(
    &mut out,
    &chunk,
    0,
    &mut offset,
    None::<&mut StandardHashTable>, // Dedup disabled (encryption prevents it)
    &compressor,
    Some(&encryptor),
    &hasher,
    &mut hash_buf,
    &mut compress_buf,
    &mut encrypt_buf,
)?;

println!("Encrypted block: offset={}, length={}", info.offset, info.length);

§Performance

Compression: Dominates runtime (~2 GB/s LZ4, ~500 MB/s Zstd)
Encryption: ~1-2 GB/s (hardware AES-NI)
Hashing: ~3200 MB/s (BLAKE3 for dedup)
I/O: Typically not bottleneck (buffered writes, ~3 GB/s sequential)

§Deduplication Effectiveness

Deduplication is most effective when:

Fixed-size blocks: Same content → same boundaries → same hash
Unencrypted: Encryption produces unique ciphertext per block (different nonces)
Redundant data: Duplicate files, repeated patterns, copy-on-write filesystems

Deduplication is ineffective when:

Content-defined chunking: Small shifts cause different boundaries
Compressed input: Pre-compressed data has low redundancy
Unique data: No duplicate blocks to detect

§Security Considerations

§Block Index as Nonce

When encrypting, block_idx is used as part of the AES-GCM nonce. CRITICAL:

Never reuse block_idx values within the same encrypted snapshot
Nonce reuse breaks AES-GCM security (allows plaintext recovery)
Each logical block must have a unique index

§Deduplication and Encryption

Deduplication is automatically disabled when encrypting because:

Each block has a unique nonce → unique ciphertext
BLAKE3(ciphertext1) ≠ BLAKE3(ciphertext2) even if plaintext is identical
Attempting dedup with encryption wastes CPU (hashing) without space savings

§Thread Safety

This function is not thread-safe with respect to the output writer:

Concurrent calls with the same out writer will interleave writes (corruption)
Concurrent calls with different writers to the same file will corrupt file

For parallel writing, use separate output files or implement external synchronization.

The dedup_map must also be externally synchronized for concurrent access.

write_block

Function write_block Copy item path