pub fn write_block<W: Write>(
out: &mut W,
chunk: &[u8],
block_idx: u64,
current_offset: &mut u64,
dedup_map: Option<&mut StandardHashTable>,
compressor: &dyn Compressor,
encryptor: Option<&dyn Encryptor>,
hasher: &dyn ContentHasher,
hash_buf: &mut [u8; 32],
compress_buf: &mut Vec<u8>,
encrypt_buf: &mut Vec<u8>,
) -> Result<BlockInfo>Expand description
Writes a compressed and optionally encrypted block to the output stream.
This function implements the complete block transformation pipeline: compression,
optional encryption, checksum computation, deduplication, and physical write.
It returns a BlockInfo descriptor suitable for inclusion in an index page.
§Transformation Pipeline
- Compression: Compress raw chunk using provided compressor (LZ4 or Zstd)
- Encryption (optional): Encrypt compressed data with AES-256-GCM using block_idx as nonce
- Checksum: Compute CRC32 of final data for integrity verification
- Deduplication (optional, not for encrypted):
- Compute BLAKE3 hash of final data
- Check dedup_map for existing block with same hash
- If found: Reuse existing offset, skip write
- If new: Write block, record offset in dedup_map
- Write: Append final data to output at current_offset
- Metadata: Create and return BlockInfo with offset, length, checksum
§Parameters
-
out: Output writer implementingWritetrait- Typically a
FileorBufWriter<File> - Must support
write_allfor atomic block writes
- Typically a
-
chunk: Uncompressed chunk data (raw bytes)- Typical size: 16 KiB - 256 KiB (configurable)
- Must not be empty (undefined behavior for zero-length chunks)
-
block_idx: Global block index (zero-based)- Used as encryption nonce (must be unique per snapshot)
- Monotonically increases across all streams
- Must not reuse indices within same encrypted snapshot (breaks security)
-
current_offset: Mutable reference to current physical file offset- Updated after successful write:
*current_offset += bytes_written - Not updated on error (file state undefined)
- Not updated for deduplicated blocks (reuses existing offset)
- Updated after successful write:
-
dedup_map: Optional deduplication hash tableSome(&mut map): Enable dedup, use this mapNone: Disable dedup, always write- Ignored if
encryptor.is_some()(encryption prevents dedup) - Maps BLAKE3 hash → physical offset of first occurrence
-
compressor: Compression algorithm implementation- Typically
Lz4CompressororZstdCompressor - Must implement
Compressortrait
- Typically
-
encryptor: Optional encryption implementationSome(enc): Encrypt compressed data with AES-256-GCMNone: Store compressed data unencrypted- Must implement
Encryptortrait
-
hasher: Content hasher for deduplication- Typically
Blake3Hasher - Must implement
ContentHashertrait - Used only when dedup_map is Some and encryptor is None
- Typically
-
hash_buf: Reusable buffer for hash output (must be ≥32 bytes)- Avoids allocation on every hash computation
- Only used when dedup is enabled
§Returns
-
Ok(BlockInfo): Block written successfully, metadata returnedoffset: Physical byte offset where block startslength: Compressed (and encrypted) size in byteslogical_len: Original uncompressed sizechecksum: CRC32 of final data (compressed + encrypted)
-
Err(Error::Io): I/O error during write- Disk full, permission denied, device error
- File state undefined (partial write may have occurred)
-
Err(Error::Compression): Compression failed- Rare; usually indicates library bug or corrupted input
-
Err(Error::Encryption): Encryption failed- Rare; usually indicates crypto library bug
§Examples
§Basic Usage (No Encryption, No Dedup)
use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;
let mut out = File::create("output.hxz")?;
let mut offset = 512u64; // After header
let chunk = vec![0x42; 65536]; // 64 KiB of data
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];
let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();
let info = write_block(
&mut out,
&chunk,
0, // block_idx
&mut offset,
None::<&mut StandardHashTable>, // No dedup
&compressor,
None, // No encryption
&hasher,
&mut hash_buf,
&mut compress_buf,
&mut encrypt_buf,
)?;
println!("Block written at offset {}, size {}", info.offset, info.length);§With Deduplication
use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;
let mut out = File::create("output.hxz")?;
let mut offset = 512u64;
let mut dedup_map = StandardHashTable::new();
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];
let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();
// Write first block
let chunk1 = vec![0xAA; 65536];
let info1 = write_block(
&mut out,
&chunk1,
0,
&mut offset,
Some(&mut dedup_map),
&compressor,
None,
&hasher,
&mut hash_buf,
&mut compress_buf,
&mut encrypt_buf,
)?;
println!("Block 0: offset={}, written", info1.offset);
// Write duplicate block (same content)
let chunk2 = vec![0xAA; 65536];
let info2 = write_block(
&mut out,
&chunk2,
1,
&mut offset,
Some(&mut dedup_map),
&compressor,
None,
&hasher,
&mut hash_buf,
&mut compress_buf,
&mut encrypt_buf,
)?;
println!("Block 1: offset={}, deduplicated (no write)", info2.offset);
assert_eq!(info1.offset, info2.offset); // Same offset, block reused§With Encryption
use hexz_core::ops::write::write_block;
use hexz_core::algo::compression::Lz4Compressor;
use hexz_core::algo::encryption::AesGcmEncryptor;
use hexz_core::algo::hashing::blake3::Blake3Hasher;
use hexz_common::crypto::KeyDerivationParams;
use hexz_core::algo::dedup::hash_table::StandardHashTable;
use std::fs::File;
let mut out = File::create("output.hxz")?;
let mut offset = 512u64;
let compressor = Lz4Compressor::new();
let hasher = Blake3Hasher;
let mut hash_buf = [0u8; 32];
// Initialize encryptor
let params = KeyDerivationParams::default();
let encryptor = AesGcmEncryptor::new(
b"strong_password",
¶ms.salt,
params.iterations,
)?;
let mut compress_buf = Vec::new();
let mut encrypt_buf = Vec::new();
let chunk = vec![0x42; 65536];
let info = write_block(
&mut out,
&chunk,
0,
&mut offset,
None::<&mut StandardHashTable>, // Dedup disabled (encryption prevents it)
&compressor,
Some(&encryptor),
&hasher,
&mut hash_buf,
&mut compress_buf,
&mut encrypt_buf,
)?;
println!("Encrypted block: offset={}, length={}", info.offset, info.length);§Performance
- Compression: Dominates runtime (~2 GB/s LZ4, ~500 MB/s Zstd)
- Encryption: ~1-2 GB/s (hardware AES-NI)
- Hashing: ~3200 MB/s (BLAKE3 for dedup)
- I/O: Typically not bottleneck (buffered writes, ~3 GB/s sequential)
§Deduplication Effectiveness
Deduplication is most effective when:
- Fixed-size blocks: Same content → same boundaries → same hash
- Unencrypted: Encryption produces unique ciphertext per block (different nonces)
- Redundant data: Duplicate files, repeated patterns, copy-on-write filesystems
Deduplication is ineffective when:
- Content-defined chunking: Small shifts cause different boundaries
- Compressed input: Pre-compressed data has low redundancy
- Unique data: No duplicate blocks to detect
§Security Considerations
§Block Index as Nonce
When encrypting, block_idx is used as part of the AES-GCM nonce. CRITICAL:
- Never reuse
block_idxvalues within the same encrypted snapshot - Nonce reuse breaks AES-GCM security (allows plaintext recovery)
- Each logical block must have a unique index
§Deduplication and Encryption
Deduplication is automatically disabled when encrypting because:
- Each block has a unique nonce → unique ciphertext
- BLAKE3(ciphertext1) ≠ BLAKE3(ciphertext2) even if plaintext is identical
- Attempting dedup with encryption wastes CPU (hashing) without space savings
§Thread Safety
This function is not thread-safe with respect to the output writer:
- Concurrent calls with the same
outwriter will interleave writes (corruption) - Concurrent calls with different writers to the same file will corrupt file
For parallel writing, use separate output files or implement external synchronization.
The dedup_map must also be externally synchronized for concurrent access.