hexz_core/ops/
write.rs

1//! Low-level write operations for Hexz snapshots.
2//!
3//! This module provides the foundational building blocks for writing compressed,
4//! encrypted, and deduplicated blocks to snapshot files. These functions implement
5//! the core write semantics used by higher-level pack operations while remaining
6//! independent of the packing workflow.
7//!
8//! # Module Purpose
9//!
10//! The write operations module serves as the bridge between the high-level packing
11//! pipeline and the raw file I/O layer. It encapsulates the logic for:
12//!
13//! - **Block Writing**: Transform raw chunks into compressed, encrypted blocks
14//! - **Deduplication**: Detect and eliminate redundant blocks via content hashing
15//! - **Zero Optimization**: Handle sparse data efficiently without storage
16//! - **Metadata Generation**: Create BlockInfo descriptors for index building
17//!
18//! # Design Philosophy
19//!
20//! These functions are designed to be composable, stateless, and easily testable.
21//! They operate on raw byte buffers and writers without knowledge of the broader
22//! packing context (progress reporting, stream management, index organization).
23//!
24//! This separation enables:
25//! - Unit testing of write logic in isolation
26//! - Reuse in different packing strategies (single-stream, multi-threaded, streaming)
27//! - Clear separation of concerns (write vs. orchestration)
28//!
29//! # Write Operation Semantics
30//!
31//! ## Block Transformation Pipeline
32//!
33//! Each block undergoes a multi-stage transformation before being written:
34//!
35//! ```text
36//! Raw Chunk (input)
37//!      ↓
38//! ┌────────────────┐
39//! │ Compression    │ → Compress using LZ4 or Zstd
40//! └────────────────┘   (reduces size, increases CPU)
41//!      ↓
42//! ┌────────────────┐
43//! │ Encryption     │ → Optional AES-256-GCM with block_idx nonce
44//! └────────────────┘   (confidentiality + integrity)
45//!      ↓
46//! ┌────────────────┐
47//! │ Checksum       │ → CRC32 of final data (fast integrity check)
48//! └────────────────┘
49//!      ↓
50//! ┌────────────────┐
51//! │ Deduplication  │ → BLAKE3 hash lookup (skip write if duplicate)
52//! └────────────────┘   (disabled for encrypted data)
53//!      ↓
54//! ┌────────────────┐
55//! │ Write          │ → Append to output file at current offset
56//! └────────────────┘
57//!      ↓
58//! BlockInfo (metadata: offset, length, checksum)
59//! ```
60//!
61//! ## Write Behavior and Atomicity
62//!
63//! ### Single Block Writes
64//!
65//! Individual block writes via [`write_block`] are atomic with respect to the
66//! underlying file system's write atomicity guarantees:
67//!
68//! - **Buffered writes**: Data passes through OS page cache
69//! - **No fsync**: Writes are not flushed to disk until the writer is closed
70//! - **Partial write handling**: Writer's `write_all` ensures complete writes or error
71//! - **Crash behavior**: Partial blocks may be written if process crashes mid-write
72//!
73//! ### Deduplication State
74//!
75//! The deduplication map is maintained externally (by the caller). This design allows:
76//! - **Flexibility**: Caller controls when/if to enable deduplication
77//! - **Memory control**: Map lifetime and size managed by orchestration layer
78//! - **Consistency**: Map updates are immediately visible to subsequent writes
79//!
80//! ### Offset Management
81//!
82//! The `current_offset` parameter is updated atomically after each successful write.
83//! This ensures:
84//! - **Sequential allocation**: Blocks are laid out contiguously in file
85//! - **No gaps**: Every byte between header and master index is utilized
86//! - **Predictable layout**: Physical offset increases monotonically
87//!
88//! ## Block Allocation Strategy
89//!
90//! Blocks are allocated sequentially in the order they are written:
91//!
92//! ```text
93//! File Layout:
94//! ┌──────────────┬──────────┬──────────┬──────────┬─────────────┐
95//! │ Header (512B)│ Block 0  │ Block 1  │ Block 2  │ Index Pages │
96//! └──────────────┴──────────┴──────────┴──────────┴─────────────┘
97//!  ↑             ↑          ↑          ↑
98//!  0             512        512+len0   512+len0+len1
99//!
100//! current_offset advances after each write:
101//! - Initial: 512 (after header)
102//! - After Block 0: 512 + len0
103//! - After Block 1: 512 + len0 + len1
104//! - After Block 2: 512 + len0 + len1 + len2
105//! ```
106//!
107//! ### Deduplication Impact
108//!
109//! When deduplication detects a duplicate block:
110//! - **No physical write**: Block is not written to disk
111//! - **Offset reuse**: BlockInfo references the existing block's offset
112//! - **Space savings**: Multiple logical blocks share one physical block
113//! - **Transparency**: Readers cannot distinguish between deduplicated and unique blocks
114//!
115//! Example with deduplication:
116//!
117//! ```text
118//! Logical Blocks: [A, B, A, C, B]
119//! Physical Blocks: [A, B, C]
120//!                   ↑  ↑     ↑
121//!                   │  │     └─ Block 3 (unique)
122//!                   │  └─ Block 1 (unique)
123//!                   └─ Block 0 (unique)
124//!
125//! BlockInfo for logical block 2: offset = offset_of(A), length = len(A)
126//! BlockInfo for logical block 4: offset = offset_of(B), length = len(B)
127//! ```
128//!
129//! ## Buffer Management
130//!
131//! This module does not perform explicit buffer management. All buffers are:
132//!
133//! - **Caller-allocated**: Input chunks are provided by caller
134//! - **Temporary allocations**: Compression/encryption output is allocated, then consumed
135//! - **No pooling**: Each operation allocates fresh buffers (GC handles reclamation)
136//!
137//! For high-performance scenarios, callers should consider:
138//! - Reusing chunk buffers across iterations
139//! - Using buffer pools for compression output (requires refactoring)
140//! - Batch writes to amortize allocation overhead
141//!
142//! ## Flush Behavior
143//!
144//! Functions in this module do NOT flush data to disk. Flushing is the caller's
145//! responsibility and typically occurs:
146//!
147//! - After writing all blocks and indices (in [`pack_snapshot`](crate::ops::pack::pack_snapshot))
148//! - Before closing the output file
149//! - Never during block writing (to maximize write batching)
150//!
151//! This design allows the OS to batch writes for optimal I/O performance.
152//!
153//! # Error Handling and Recovery
154//!
155//! ## Error Categories
156//!
157//! Write operations can fail for several reasons:
158//!
159//! ### I/O Errors
160//!
161//! - **Disk full**: No space for compressed block (`ENOSPC`)
162//! - **Permission denied**: Writer lacks write permission (`EACCES`)
163//! - **Device error**: Hardware failure, I/O timeout (`EIO`)
164//!
165//! These surface as `Error::Io` wrapping the underlying `std::io::Error`.
166//!
167//! ### Compression Errors
168//!
169//! - **Compression failure**: Compressor returns error (rare, usually indicates bug)
170//! - **Incompressible data**: Not an error; stored with expansion
171//!
172//! These surface as `Error::Compression`.
173//!
174//! ### Encryption Errors
175//!
176//! - **Cipher initialization failure**: Invalid state (should not occur in practice)
177//! - **Encryption failure**: Crypto operation fails (indicates library bug)
178//!
179//! These surface as `Error::Encryption`.
180//!
181//! ## Error Recovery
182//!
183//! Write operations provide **no automatic recovery**. On error:
184//!
185//! - **Function returns immediately**: No cleanup or rollback
186//! - **File state undefined**: Partial data may be written
187//! - **Caller responsibility**: Must handle error and clean up
188//!
189//! Typical error handling pattern in pack operations:
190//!
191//! ```text
192//! match write_block_simple(...) {
193//!     Ok(info) => {
194//!         // Success: Add info to index, continue
195//!     }
196//!     Err(e) => {
197//!         // Failure: Log error, delete partial output file, return error to caller
198//!         std::fs::remove_file(output)?;
199//!         return Err(e);
200//!     }
201//! }
202//! ```
203//!
204//! ## Partial Write Handling
205//!
206//! The underlying `Write::write_all` method ensures atomic writes of complete blocks:
207//!
208//! - **Success**: Entire block written, offset updated
209//! - **Failure**: Partial write may occur, but error is returned
210//! - **No retry**: Caller must handle retries if desired
211//!
212//! # Performance Characteristics
213//!
214//! ## Write Throughput
215//!
216//! Block write performance is dominated by compression:
217//!
218//! - **LZ4**: ~2 GB/s (minimal overhead)
219//! - **Zstd level 3**: ~200-500 MB/s (depends on data)
220//! - **Encryption**: ~1-2 GB/s (hardware AES-NI)
221//! - **BLAKE3 hashing**: ~3200 MB/s (for deduplication)
222//!
223//! Typical bottleneck: Compression CPU time.
224//!
225//! ## Deduplication Overhead
226//!
227//! BLAKE3 hashing adds ~5-10% overhead to write operations:
228//!
229//! - **Hash computation**: ~3200 MB/s throughput (BLAKE3 tree-hashed)
230//! - **Hash table lookup**: O(1) average, ~50-100 ns per lookup
231//! - **Memory usage**: ~48 bytes per unique block
232//!
233//! For datasets with <10% duplication, deduplication overhead may exceed savings.
234//! Consider disabling dedup for unique data.
235//!
236//! ## Zero Block Detection
237//!
238//! [`is_zero_chunk`] uses SIMD-optimized comparison on modern CPUs:
239//!
240//! - **Throughput**: ~10-20 GB/s (memory bandwidth limited)
241//! - **Overhead**: Negligible (~5-10 cycles per 64-byte cache line)
242//!
243//! Zero detection is always worth enabling for sparse data.
244//!
245//! # Memory Usage
246//!
247//! Per-block memory allocation:
248//!
249//! - **Input chunk**: Caller-provided (typically 64 KiB)
250//! - **Compression output**: ~1.5× chunk size worst case (incompressible data)
251//! - **Encryption output**: compression_size + 28 bytes (AES-GCM overhead)
252//! - **Dedup hash**: 32 bytes (BLAKE3 digest)
253//!
254//! Total temporary allocation per write: ~100-150 KiB (released immediately after write).
255//!
256//! # Examples
257//!
258//! See individual function documentation for usage examples.
259//!
260//! # Future Enhancements
261//!
262//! Potential improvements to write operations:
263//!
264//! - **Buffer pooling**: Reuse compression/encryption buffers to reduce allocation overhead
265//! - **Async I/O**: Use `tokio` or `io_uring` for overlapped writes
266//! - **Parallel writes**: Write multiple blocks concurrently (requires coordination)
267//! - **Write-ahead logging**: Enable atomic commits for crash safety
268
269use hexz_common::Result;
270use std::io::Write;
271
272use crate::algo::compression::Compressor;
273use crate::algo::dedup::hash_table::StandardHashTable;
274use crate::algo::encryption::Encryptor;
275use crate::algo::hashing::ContentHasher;
276use crate::format::index::BlockInfo;
277
278/// Writes a compressed and optionally encrypted block to the output stream.
279///
280/// This function implements the complete block transformation pipeline: compression,
281/// optional encryption, checksum computation, deduplication, and physical write.
282/// It returns a `BlockInfo` descriptor suitable for inclusion in an index page.
283///
284/// # Transformation Pipeline
285///
286/// 1. **Compression**: Compress raw chunk using provided compressor (LZ4 or Zstd)
287/// 2. **Encryption** (optional): Encrypt compressed data with AES-256-GCM using block_idx as nonce
288/// 3. **Checksum**: Compute CRC32 of final data for integrity verification
289/// 4. **Deduplication** (optional, not for encrypted):
290///    - Compute BLAKE3 hash of final data
291///    - Check dedup_map for existing block with same hash
292///    - If found: Reuse existing offset, skip write
293///    - If new: Write block, record offset in dedup_map
294/// 5. **Write**: Append final data to output at current_offset
295/// 6. **Metadata**: Create and return BlockInfo with offset, length, checksum
296///
297/// # Parameters
298///
299/// - `out`: Output writer implementing `Write` trait
300///   - Typically a `File` or `BufWriter<File>`
301///   - Must support `write_all` for atomic block writes
302///
303/// - `chunk`: Uncompressed chunk data (raw bytes)
304///   - Typical size: 16 KiB - 256 KiB (configurable)
305///   - Must not be empty (undefined behavior for zero-length chunks)
306///
307/// - `block_idx`: Global block index (zero-based)
308///   - Used as encryption nonce (must be unique per snapshot)
309///   - Monotonically increases across all streams
310///   - Must not reuse indices within same encrypted snapshot (breaks security)
311///
312/// - `current_offset`: Mutable reference to current physical file offset
313///   - Updated after successful write: `*current_offset += bytes_written`
314///   - Not updated on error (file state undefined)
315///   - Not updated for deduplicated blocks (reuses existing offset)
316///
317/// - `dedup_map`: Optional deduplication hash table
318///   - `Some(&mut map)`: Enable dedup, use this map
319///   - `None`: Disable dedup, always write
320///   - Ignored if `encryptor.is_some()` (encryption prevents dedup)
321///   - Maps BLAKE3 hash → physical offset of first occurrence
322///
323/// - `compressor`: Compression algorithm implementation
324///   - Typically `Lz4Compressor` or `ZstdCompressor`
325///   - Must implement [`Compressor`] trait
326///
327/// - `encryptor`: Optional encryption implementation
328///   - `Some(enc)`: Encrypt compressed data with AES-256-GCM
329///   - `None`: Store compressed data unencrypted
330///   - Must implement [`Encryptor`] trait
331///
332/// - `hasher`: Content hasher for deduplication
333///   - Typically `Blake3Hasher`
334///   - Must implement [`ContentHasher`] trait
335///   - Used only when dedup_map is Some and encryptor is None
336///
337/// - `hash_buf`: Reusable buffer for hash output (must be ≥32 bytes)
338///   - Avoids allocation on every hash computation
339///   - Only used when dedup is enabled
340///
341/// # Returns
342///
343/// - `Ok(BlockInfo)`: Block written successfully, metadata returned
344///   - `offset`: Physical byte offset where block starts
345///   - `length`: Compressed (and encrypted) size in bytes
346///   - `logical_len`: Original uncompressed size
347///   - `checksum`: CRC32 of final data (compressed + encrypted)
348///
349/// - `Err(Error::Io)`: I/O error during write
350///   - Disk full, permission denied, device error
351///   - File state undefined (partial write may have occurred)
352///
353/// - `Err(Error::Compression)`: Compression failed
354///   - Rare; usually indicates library bug or corrupted input
355///
356/// - `Err(Error::Encryption)`: Encryption failed
357///   - Rare; usually indicates crypto library bug
358///
359/// # Examples
360///
361/// ## Basic Usage (No Encryption, No Dedup)
362///
363/// ```no_run
364/// use hexz_core::ops::write::write_block;
365/// use hexz_core::algo::compression::Lz4Compressor;
366/// use hexz_core::algo::hashing::blake3::Blake3Hasher;
367/// use hexz_core::algo::dedup::hash_table::StandardHashTable;
368/// use std::fs::File;
369///
370/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
371/// let mut out = File::create("output.hxz")?;
372/// let mut offset = 512u64; // After header
373/// let chunk = vec![0x42; 65536]; // 64 KiB of data
374/// let compressor = Lz4Compressor::new();
375/// let hasher = Blake3Hasher;
376/// let mut hash_buf = [0u8; 32];
377///
378/// let mut compress_buf = Vec::new();
379/// let mut encrypt_buf = Vec::new();
380///
381/// let info = write_block(
382///     &mut out,
383///     &chunk,
384///     0,              // block_idx
385///     &mut offset,
386///     None::<&mut StandardHashTable>, // No dedup
387///     &compressor,
388///     None,           // No encryption
389///     &hasher,
390///     &mut hash_buf,
391///     &mut compress_buf,
392///     &mut encrypt_buf,
393/// )?;
394///
395/// println!("Block written at offset {}, size {}", info.offset, info.length);
396/// # Ok(())
397/// # }
398/// ```
399///
400/// ## With Deduplication
401///
402/// ```no_run
403/// use hexz_core::ops::write::write_block;
404/// use hexz_core::algo::compression::Lz4Compressor;
405/// use hexz_core::algo::hashing::blake3::Blake3Hasher;
406/// use hexz_core::algo::dedup::hash_table::StandardHashTable;
407/// use std::fs::File;
408///
409/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
410/// let mut out = File::create("output.hxz")?;
411/// let mut offset = 512u64;
412/// let mut dedup_map = StandardHashTable::new();
413/// let compressor = Lz4Compressor::new();
414/// let hasher = Blake3Hasher;
415/// let mut hash_buf = [0u8; 32];
416/// let mut compress_buf = Vec::new();
417/// let mut encrypt_buf = Vec::new();
418///
419/// // Write first block
420/// let chunk1 = vec![0xAA; 65536];
421/// let info1 = write_block(
422///     &mut out,
423///     &chunk1,
424///     0,
425///     &mut offset,
426///     Some(&mut dedup_map),
427///     &compressor,
428///     None,
429///     &hasher,
430///     &mut hash_buf,
431///     &mut compress_buf,
432///     &mut encrypt_buf,
433/// )?;
434/// println!("Block 0: offset={}, written", info1.offset);
435///
436/// // Write duplicate block (same content)
437/// let chunk2 = vec![0xAA; 65536];
438/// let info2 = write_block(
439///     &mut out,
440///     &chunk2,
441///     1,
442///     &mut offset,
443///     Some(&mut dedup_map),
444///     &compressor,
445///     None,
446///     &hasher,
447///     &mut hash_buf,
448///     &mut compress_buf,
449///     &mut encrypt_buf,
450/// )?;
451/// println!("Block 1: offset={}, deduplicated (no write)", info2.offset);
452/// assert_eq!(info1.offset, info2.offset); // Same offset, block reused
453/// # Ok(())
454/// # }
455/// ```
456///
457/// ## With Encryption
458///
459/// ```no_run
460/// use hexz_core::ops::write::write_block;
461/// use hexz_core::algo::compression::Lz4Compressor;
462/// use hexz_core::algo::encryption::AesGcmEncryptor;
463/// use hexz_core::algo::hashing::blake3::Blake3Hasher;
464/// use hexz_common::crypto::KeyDerivationParams;
465/// use hexz_core::algo::dedup::hash_table::StandardHashTable;
466/// use std::fs::File;
467///
468/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
469/// let mut out = File::create("output.hxz")?;
470/// let mut offset = 512u64;
471/// let compressor = Lz4Compressor::new();
472/// let hasher = Blake3Hasher;
473/// let mut hash_buf = [0u8; 32];
474///
475/// // Initialize encryptor
476/// let params = KeyDerivationParams::default();
477/// let encryptor = AesGcmEncryptor::new(
478///     b"strong_password",
479///     &params.salt,
480///     params.iterations,
481/// )?;
482///
483/// let mut compress_buf = Vec::new();
484/// let mut encrypt_buf = Vec::new();
485///
486/// let chunk = vec![0x42; 65536];
487/// let info = write_block(
488///     &mut out,
489///     &chunk,
490///     0,
491///     &mut offset,
492///     None::<&mut StandardHashTable>, // Dedup disabled (encryption prevents it)
493///     &compressor,
494///     Some(&encryptor),
495///     &hasher,
496///     &mut hash_buf,
497///     &mut compress_buf,
498///     &mut encrypt_buf,
499/// )?;
500///
501/// println!("Encrypted block: offset={}, length={}", info.offset, info.length);
502/// # Ok(())
503/// # }
504/// ```
505///
506/// # Performance
507///
508/// - **Compression**: Dominates runtime (~2 GB/s LZ4, ~500 MB/s Zstd)
509/// - **Encryption**: ~1-2 GB/s (hardware AES-NI)
510/// - **Hashing**: ~3200 MB/s (BLAKE3 for dedup)
511/// - **I/O**: Typically not bottleneck (buffered writes, ~3 GB/s sequential)
512///
513/// # Deduplication Effectiveness
514///
515/// Deduplication is most effective when:
516/// - **Fixed-size blocks**: Same content → same boundaries → same hash
517/// - **Unencrypted**: Encryption produces unique ciphertext per block (different nonces)
518/// - **Redundant data**: Duplicate files, repeated patterns, copy-on-write filesystems
519///
520/// Deduplication is ineffective when:
521/// - **Content-defined chunking**: Small shifts cause different boundaries
522/// - **Compressed input**: Pre-compressed data has low redundancy
523/// - **Unique data**: No duplicate blocks to detect
524///
525/// # Security Considerations
526///
527/// ## Block Index as Nonce
528///
529/// When encrypting, `block_idx` is used as part of the AES-GCM nonce. **CRITICAL**:
530/// - Never reuse `block_idx` values within the same encrypted snapshot
531/// - Nonce reuse breaks AES-GCM security (allows plaintext recovery)
532/// - Each logical block must have a unique index
533///
534/// ## Deduplication and Encryption
535///
536/// Deduplication is automatically disabled when encrypting because:
537/// - Each block has a unique nonce → unique ciphertext
538/// - BLAKE3(ciphertext1) ≠ BLAKE3(ciphertext2) even if plaintext is identical
539/// - Attempting dedup with encryption wastes CPU (hashing) without space savings
540///
541/// # Thread Safety
542///
543/// This function is **not thread-safe** with respect to the output writer:
544/// - Concurrent calls with the same `out` writer will interleave writes (corruption)
545/// - Concurrent calls with different writers to the same file will corrupt file
546///
547/// For parallel writing, use separate output files or implement external synchronization.
548///
549/// The dedup_map must also be externally synchronized for concurrent access.
550#[allow(clippy::too_many_arguments)]
551pub fn write_block<W: Write>(
552    out: &mut W,
553    chunk: &[u8],
554    block_idx: u64,
555    current_offset: &mut u64,
556    dedup_map: Option<&mut StandardHashTable>,
557    compressor: &dyn Compressor,
558    encryptor: Option<&dyn Encryptor>,
559    hasher: &dyn ContentHasher,
560    hash_buf: &mut [u8; 32],
561    compress_buf: &mut Vec<u8>,
562    encrypt_buf: &mut Vec<u8>,
563) -> Result<BlockInfo> {
564    // Compress the chunk into reusable buffer
565    compressor.compress_into(chunk, compress_buf)?;
566
567    // Encrypt if requested, using reusable buffer
568    let final_data: &[u8] = if let Some(enc) = encryptor {
569        enc.encrypt_into(compress_buf, block_idx, encrypt_buf)?;
570        encrypt_buf
571    } else {
572        compress_buf
573    };
574
575    let checksum = crc32fast::hash(final_data);
576    let chunk_len = chunk.len() as u32;
577    let final_len = final_data.len() as u32;
578
579    // Handle deduplication (only if not encrypting)
580    let offset = if encryptor.is_some() {
581        // No dedup for encrypted data
582        let off = *current_offset;
583        out.write_all(final_data)?;
584        *current_offset += final_len as u64;
585        off
586    } else if let Some(map) = dedup_map {
587        // Hash directly into the fixed-size buffer (no runtime bounds check).
588        // Hash the UNCOMPRESSED data for consistent deduplication across compression algorithms.
589        *hash_buf = hasher.hash_fixed(chunk);
590
591        if let Some(existing_offset) = map.get(hash_buf) {
592            // Block already exists, reuse it — no copy needed on hit
593            existing_offset
594        } else {
595            // New block: copy hash_buf only on miss (insert needs owned key)
596            let off = *current_offset;
597            map.insert(*hash_buf, off);
598            out.write_all(final_data)?;
599            *current_offset += final_len as u64;
600            off
601        }
602    } else {
603        // No dedup, just write
604        let off = *current_offset;
605        out.write_all(final_data)?;
606        *current_offset += final_len as u64;
607        off
608    };
609
610    Ok(BlockInfo {
611        offset,
612        length: final_len,
613        logical_len: chunk_len,
614        checksum,
615        hash: *hash_buf,
616    })
617}
618
619/// Creates a zero-block descriptor without writing data to disk.
620///
621/// Zero blocks (all-zero chunks) are a special case optimized for space efficiency.
622/// Instead of compressing and storing zeros, we create a metadata-only descriptor
623/// that signals to the reader to return zeros without performing any I/O.
624///
625/// # Sparse Data Optimization
626///
627/// Many VM disk images and memory dumps contain large regions of zeros:
628/// - **Unallocated disk space**: File systems often zero-initialize blocks
629/// - **Memory pages**: Unused or zero-initialized memory
630/// - **Sparse files**: Holes in sparse file systems
631///
632/// Storing these zeros (even compressed) wastes space:
633/// - **LZ4-compressed zeros**: ~100 bytes per 64 KiB block (~0.15% of original)
634/// - **Uncompressed zeros**: 64 KiB per block (100%)
635/// - **Metadata-only**: 20 bytes per block (~0.03%)
636///
637/// The metadata approach saves 99.97% of space for zero blocks.
638///
639/// # Descriptor Format
640///
641/// Zero blocks are identified by a special BlockInfo signature:
642/// - `offset = 0`: Invalid physical offset (data region starts at ≥512)
643/// - `length = 0`: No physical storage
644/// - `logical_len = N`: Original zero block size in bytes
645/// - `checksum = 0`: No checksum needed (zeros are deterministic)
646///
647/// Readers recognize this pattern and synthesize zeros without I/O.
648///
649/// # Parameters
650///
651/// - `logical_len`: Size of the zero block in bytes
652///   - Typically matches block_size (e.g., 65536 for 64 KiB blocks)
653///   - Can vary with content-defined chunking
654///   - Must be > 0 (zero-length blocks are invalid)
655///
656/// # Returns
657///
658/// `BlockInfo` descriptor with zero-block semantics:
659/// - `offset = 0`
660/// - `length = 0`
661/// - `logical_len = logical_len`
662/// - `checksum = 0`
663///
664/// # Examples
665///
666/// ## Detecting and Creating Zero Blocks
667///
668/// ```
669/// use hexz_core::ops::write::{is_zero_chunk, create_zero_block};
670/// use hexz_core::format::index::BlockInfo;
671///
672/// let chunk = vec![0u8; 65536]; // 64 KiB of zeros
673///
674/// if is_zero_chunk(&chunk) {
675///     let info = create_zero_block(chunk.len() as u32);
676///     assert_eq!(info.offset, 0);
677///     assert_eq!(info.length, 0);
678///     assert_eq!(info.logical_len, 65536);
679///     println!("Zero block: No storage required!");
680/// }
681/// ```
682///
683/// ## Usage in Packing Loop
684///
685/// ```no_run
686/// # use hexz_core::ops::write::{is_zero_chunk, create_zero_block, write_block};
687/// # use hexz_core::algo::compression::Lz4Compressor;
688/// # use hexz_core::algo::hashing::blake3::Blake3Hasher;
689/// # use hexz_core::algo::dedup::hash_table::StandardHashTable;
690/// # use std::fs::File;
691/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
692/// # let mut out = File::create("output.hxz")?;
693/// # let mut offset = 512u64;
694/// # let compressor = Lz4Compressor::new();
695/// # let hasher = Blake3Hasher;
696/// # let mut hash_buf = [0u8; 32];
697/// # let mut compress_buf = Vec::new();
698/// # let mut encrypt_buf = Vec::new();
699/// # let chunks: Vec<Vec<u8>> = vec![];
700/// for (idx, chunk) in chunks.iter().enumerate() {
701///     let info = if is_zero_chunk(chunk) {
702///         // Optimize: No compression, no write
703///         create_zero_block(chunk.len() as u32)
704///     } else {
705///         // Normal path: Compress and write
706///         write_block(&mut out, chunk, idx as u64, &mut offset, None::<&mut StandardHashTable>, &compressor, None, &hasher, &mut hash_buf, &mut compress_buf, &mut encrypt_buf)?
707///     };
708///     // Add info to index page...
709/// }
710/// # Ok(())
711/// # }
712/// ```
713///
714/// # Performance
715///
716/// - **Time complexity**: O(1) (no I/O, no computation)
717/// - **Space complexity**: O(1) (fixed-size struct)
718/// - **Typical savings**: 99.97% vs. compressed zeros
719///
720/// # Reader Behavior
721///
722/// When a reader encounters a zero block (offset=0, length=0):
723/// 1. Recognize zero-block pattern from metadata
724/// 2. Allocate buffer of size `logical_len`
725/// 3. Fill buffer with zeros (optimized memset)
726/// 4. Return buffer to caller
727///
728/// No decompression, decryption, or checksum verification is performed.
729///
730/// # Interaction with Deduplication
731///
732/// Zero blocks do not participate in deduplication:
733/// - They are never written to disk → no physical offset → no dedup entry
734/// - Each zero block gets its own metadata descriptor
735/// - This is fine: Metadata is cheap (20 bytes), and all zero blocks have same content
736///
737/// # Interaction with Encryption
738///
739/// Zero blocks work correctly with encryption:
740/// - They are detected **before** compression/encryption
741/// - Encrypted snapshots still use zero-block optimization
742/// - Readers synthesize zeros without decryption
743///
744/// This is safe because zeros are public information (no confidentiality lost).
745///
746/// # Validation
747///
748/// **IMPORTANT**: This function does NOT validate that the original chunk was actually
749/// all zeros. The caller is responsible for calling [`is_zero_chunk`] first.
750///
751/// If a non-zero chunk is incorrectly marked as a zero block, readers will return
752/// zeros instead of the original data (silent data corruption).
753pub fn create_zero_block(logical_len: u32) -> BlockInfo {
754    BlockInfo {
755        offset: 0,
756        length: 0,
757        logical_len,
758        checksum: 0,
759        hash: [0u8; 32],
760    }
761}
762
763/// Convenience wrapper for `write_block` that allocates hasher and buffer internally.
764///
765/// This is a simpler API for tests and one-off writes. For hot paths (like snapshot
766/// packing loops), use `write_block` directly with a reused hasher and buffer.
767#[allow(dead_code)]
768fn write_block_simple<W: Write>(
769    out: &mut W,
770    chunk: &[u8],
771    block_idx: u64,
772    current_offset: &mut u64,
773    dedup_map: Option<&mut StandardHashTable>,
774    compressor: &dyn Compressor,
775    encryptor: Option<&dyn Encryptor>,
776) -> Result<BlockInfo> {
777    use crate::algo::hashing::blake3::Blake3Hasher;
778    let hasher = Blake3Hasher;
779    let mut hash_buf = [0u8; 32];
780    let mut compress_buf = Vec::new();
781    let mut encrypt_buf = Vec::new();
782    write_block(
783        out,
784        chunk,
785        block_idx,
786        current_offset,
787        dedup_map,
788        compressor,
789        encryptor,
790        &hasher,
791        &mut hash_buf,
792        &mut compress_buf,
793        &mut encrypt_buf,
794    )
795}
796
797/// Checks if a chunk consists entirely of zero bytes.
798///
799/// This function efficiently detects all-zero chunks to enable sparse block optimization.
800/// Zero chunks are common in VM images (unallocated space), memory dumps (zero-initialized
801/// pages), and sparse files.
802///
803/// # Algorithm
804///
805/// Uses Rust's iterator `all()` combinator, which:
806/// - Short-circuits on first non-zero byte (early exit)
807/// - Compiles to SIMD instructions on modern CPUs (autovectorization)
808/// - Typically processes 16-32 bytes per instruction (AVX2/AVX-512)
809///
810/// # Parameters
811///
812/// - `chunk`: Byte slice to check
813///   - Empty slices return `true` (vacuous truth)
814///   - Typical size: 16 KiB - 256 KiB (configurable block size)
815///
816/// # Returns
817///
818/// - `true`: All bytes are zero (sparse block, use [`create_zero_block`])
819/// - `false`: At least one non-zero byte (normal block, compress and write)
820///
821/// # Performance
822///
823/// Modern CPUs with SIMD support achieve excellent throughput:
824///
825/// - **SIMD-optimized**: ~10-20 GB/s (memory bandwidth limited)
826/// - **Scalar fallback**: ~1-2 GB/s (without SIMD)
827/// - **Typical overhead**: <1% of total packing time
828///
829/// The check is always worth performing given the massive space savings for zero blocks.
830///
831/// # Examples
832///
833/// ## Basic Usage
834///
835/// ```
836/// use hexz_core::ops::write::is_zero_chunk;
837///
838/// let zeros = vec![0u8; 65536];
839/// assert!(is_zero_chunk(&zeros));
840///
841/// let data = vec![0u8, 1u8, 0u8];
842/// assert!(!is_zero_chunk(&data));
843///
844/// let empty: &[u8] = &[];
845/// assert!(is_zero_chunk(empty)); // Empty is considered "all zeros"
846/// ```
847///
848/// ## Packing Loop Integration
849///
850/// ```no_run
851/// # use hexz_core::ops::write::{is_zero_chunk, create_zero_block, write_block};
852/// # use hexz_core::algo::compression::Lz4Compressor;
853/// # use hexz_core::algo::hashing::blake3::Blake3Hasher;
854/// # use hexz_core::format::index::BlockInfo;
855/// # use hexz_core::algo::dedup::hash_table::StandardHashTable;
856/// # use std::fs::File;
857/// # fn main() -> Result<(), Box<dyn std::error::Error>> {
858/// # let mut out = File::create("output.hxz")?;
859/// # let mut offset = 512u64;
860/// # let compressor = Lz4Compressor::new();
861/// # let hasher = Blake3Hasher;
862/// # let mut hash_buf = [0u8; 32];
863/// # let mut compress_buf = Vec::new();
864/// # let mut encrypt_buf = Vec::new();
865/// # let mut index_blocks = Vec::new();
866/// # let chunks: Vec<Vec<u8>> = vec![];
867/// for (idx, chunk) in chunks.iter().enumerate() {
868///     let info = if is_zero_chunk(chunk) {
869///         // Fast path: No compression, no write, just metadata
870///         create_zero_block(chunk.len() as u32)
871///     } else {
872///         // Slow path: Compress, write, create metadata
873///         write_block(&mut out, chunk, idx as u64, &mut offset, None::<&mut StandardHashTable>, &compressor, None, &hasher, &mut hash_buf, &mut compress_buf, &mut encrypt_buf)?
874///     };
875///     index_blocks.push(info);
876/// }
877/// # Ok(())
878/// # }
879/// ```
880///
881/// ## Benchmarking Zero Detection
882///
883/// ```
884/// use hexz_core::ops::write::is_zero_chunk;
885/// use std::time::Instant;
886///
887/// let chunk = vec![0u8; 64 * 1024 * 1024]; // 64 MiB
888/// let start = Instant::now();
889///
890/// for _ in 0..100 {
891///     let _ = is_zero_chunk(&chunk);
892/// }
893///
894/// let elapsed = start.elapsed();
895/// let throughput = (64.0 * 100.0) / elapsed.as_secs_f64(); // MB/s
896/// println!("Zero detection: {:.1} GB/s", throughput / 1024.0);
897/// ```
898///
899/// # SIMD Optimization
900///
901/// On x86-64 with AVX2, the compiler typically generates code like:
902///
903/// ```text
904/// vpxor    ymm0, ymm0, ymm0    ; Zero register
905/// loop:
906///   vmovdqu  ymm1, [rsi]        ; Load 32 bytes
907///   vpcmpeqb ymm2, ymm1, ymm0   ; Compare with zero
908///   vpmovmskb eax, ymm2         ; Extract comparison mask
909///   cmp      eax, 0xFFFFFFFF    ; All zeros?
910///   jne      found_nonzero      ; Early exit if not
911///   add      rsi, 32            ; Advance pointer
912///   loop
913/// ```
914///
915/// This processes 32 bytes per iteration (~1-2 cycles on modern CPUs).
916///
917/// # Edge Cases
918///
919/// - **Empty chunks**: Return `true` (vacuous truth, no non-zero bytes)
920/// - **Single byte**: Works correctly, no special handling needed
921/// - **Unaligned chunks**: SIMD code handles unaligned loads transparently
922///
923/// # Alternative Implementations
924///
925/// Other possible implementations (not currently used):
926///
927/// 1. **Manual SIMD**: Use `std::arch` for explicit SIMD (faster but less portable)
928/// 2. **Chunked comparison**: Process in 8-byte chunks with `u64` casts (faster scalar)
929/// 3. **Bitmap scan**: Use CPU's `bsf`/`tzcnt` to skip zero regions (complex)
930///
931/// Current implementation relies on compiler autovectorization, which works well
932/// in practice and maintains portability.
933///
934/// # Correctness
935///
936/// This function is pure and infallible:
937/// - No side effects (read-only operation)
938/// - No panics (iterator `all()` is safe for all inputs)
939/// - No undefined behavior (all byte patterns are valid)
940pub fn is_zero_chunk(chunk: &[u8]) -> bool {
941    chunk.iter().all(|&b| b == 0)
942}
943
944#[cfg(test)]
945mod tests {
946    use super::*;
947    use crate::algo::compression::{Lz4Compressor, ZstdCompressor};
948    use crate::algo::encryption::AesGcmEncryptor;
949    use std::io::Cursor;
950
951    /// Convenience wrapper that calls write_block_simple with no dedup map.
952    fn write_block_no_dedup<W: Write>(
953        out: &mut W,
954        chunk: &[u8],
955        block_idx: u64,
956        current_offset: &mut u64,
957        compressor: &dyn Compressor,
958        encryptor: Option<&dyn Encryptor>,
959    ) -> Result<BlockInfo> {
960        write_block_simple(
961            out,
962            chunk,
963            block_idx,
964            current_offset,
965            None::<&mut StandardHashTable>,
966            compressor,
967            encryptor,
968        )
969    }
970
971    #[test]
972    fn test_is_zero_chunk_all_zeros() {
973        let chunk = vec![0u8; 1024];
974        assert!(is_zero_chunk(&chunk));
975    }
976
977    #[test]
978    fn test_is_zero_chunk_with_nonzero() {
979        let mut chunk = vec![0u8; 1024];
980        chunk[512] = 1; // Single non-zero byte
981        assert!(!is_zero_chunk(&chunk));
982    }
983
984    #[test]
985    fn test_is_zero_chunk_all_nonzero() {
986        let chunk = vec![0xFFu8; 1024];
987        assert!(!is_zero_chunk(&chunk));
988    }
989
990    #[test]
991    fn test_is_zero_chunk_empty() {
992        let chunk: Vec<u8> = vec![];
993        assert!(is_zero_chunk(&chunk)); // Vacuous truth
994    }
995
996    #[test]
997    fn test_is_zero_chunk_single_zero() {
998        let chunk = vec![0u8];
999        assert!(is_zero_chunk(&chunk));
1000    }
1001
1002    #[test]
1003    fn test_is_zero_chunk_single_nonzero() {
1004        let chunk = vec![1u8];
1005        assert!(!is_zero_chunk(&chunk));
1006    }
1007
1008    #[test]
1009    fn test_create_zero_block() {
1010        let logical_len = 65536;
1011        let info = create_zero_block(logical_len);
1012
1013        assert_eq!(info.offset, 0);
1014        assert_eq!(info.length, 0);
1015        assert_eq!(info.logical_len, logical_len);
1016        assert_eq!(info.checksum, 0);
1017    }
1018
1019    #[test]
1020    fn test_create_zero_block_various_sizes() {
1021        for size in [1, 16, 1024, 4096, 65536, 1048576] {
1022            let info = create_zero_block(size);
1023            assert_eq!(info.offset, 0);
1024            assert_eq!(info.length, 0);
1025            assert_eq!(info.logical_len, size);
1026            assert_eq!(info.checksum, 0);
1027        }
1028    }
1029
1030    #[test]
1031    fn test_write_block_basic_lz4() {
1032        let mut output = Cursor::new(Vec::new());
1033        let mut offset = 512u64; // Start after header
1034        let chunk = vec![0xAAu8; 4096];
1035        let compressor = Lz4Compressor::new();
1036
1037        let result = write_block_no_dedup(&mut output, &chunk, 0, &mut offset, &compressor, None);
1038
1039        assert!(result.is_ok());
1040        let info = result.unwrap();
1041
1042        // Verify offset updated
1043        assert!(offset > 512);
1044
1045        // Verify block info
1046        assert_eq!(info.offset, 512);
1047        assert!(info.length > 0); // Compressed data written
1048        assert_eq!(info.logical_len, 4096);
1049        assert!(info.checksum != 0);
1050
1051        // Verify data was written
1052        let written = output.into_inner();
1053        assert_eq!(written.len(), (offset - 512) as usize);
1054    }
1055
1056    #[test]
1057    fn test_write_block_basic_zstd() {
1058        let mut output = Cursor::new(Vec::new());
1059        let mut offset = 512u64;
1060        let chunk = vec![0xAAu8; 4096];
1061        let compressor = ZstdCompressor::new(3, None);
1062
1063        let result = write_block_no_dedup(&mut output, &chunk, 0, &mut offset, &compressor, None);
1064
1065        assert!(result.is_ok());
1066        let info = result.unwrap();
1067
1068        assert_eq!(info.offset, 512);
1069        assert!(info.length > 0);
1070        assert_eq!(info.logical_len, 4096);
1071    }
1072
1073    #[test]
1074    fn test_write_block_incompressible_data() {
1075        let mut output = Cursor::new(Vec::new());
1076        let mut offset = 512u64;
1077
1078        // Random-ish data that doesn't compress well
1079        let chunk: Vec<u8> = (0..4096).map(|i| ((i * 7 + 13) % 256) as u8).collect();
1080        let compressor = Lz4Compressor::new();
1081
1082        let result = write_block_no_dedup(&mut output, &chunk, 0, &mut offset, &compressor, None);
1083
1084        assert!(result.is_ok());
1085        let info = result.unwrap();
1086
1087        // Even "incompressible" data might compress slightly or expand
1088        // Just verify it executed successfully
1089        assert_eq!(info.logical_len, chunk.len() as u32);
1090        assert!(info.length > 0);
1091    }
1092
1093    #[test]
1094    fn test_write_block_with_dedup_unique_blocks() {
1095        let mut output = Cursor::new(Vec::new());
1096        let mut offset = 512u64;
1097        let mut dedup_map = StandardHashTable::new();
1098        let compressor = Lz4Compressor::new();
1099
1100        // Write first block
1101        let chunk1 = vec![0xAAu8; 4096];
1102        let info1 = write_block_simple(
1103            &mut output,
1104            &chunk1,
1105            0,
1106            &mut offset,
1107            Some(&mut dedup_map),
1108            &compressor,
1109            None,
1110        )
1111        .unwrap();
1112
1113        let offset_after_block1 = offset;
1114
1115        // Write second unique block
1116        let chunk2 = vec![0xBBu8; 4096];
1117        let info2 = write_block_simple(
1118            &mut output,
1119            &chunk2,
1120            1,
1121            &mut offset,
1122            Some(&mut dedup_map),
1123            &compressor,
1124            None,
1125        )
1126        .unwrap();
1127
1128        // Both blocks should be written
1129        assert_eq!(info1.offset, 512);
1130        assert_eq!(info2.offset, offset_after_block1);
1131        assert!(offset > offset_after_block1);
1132
1133        // Dedup map should have 2 entries
1134        assert_eq!(dedup_map.len(), 2);
1135    }
1136
1137    #[test]
1138    fn test_write_block_with_dedup_duplicate_blocks() {
1139        let mut output = Cursor::new(Vec::new());
1140        let mut offset = 512u64;
1141        let mut dedup_map = StandardHashTable::new();
1142        let compressor = Lz4Compressor::new();
1143
1144        // Write first block
1145        let chunk1 = vec![0xAAu8; 4096];
1146        let info1 = write_block_simple(
1147            &mut output,
1148            &chunk1,
1149            0,
1150            &mut offset,
1151            Some(&mut dedup_map),
1152            &compressor,
1153            None,
1154        )
1155        .unwrap();
1156
1157        let offset_after_block1 = offset;
1158
1159        // Write duplicate block (same content)
1160        let chunk2 = vec![0xAAu8; 4096];
1161        let info2 = write_block_simple(
1162            &mut output,
1163            &chunk2,
1164            1,
1165            &mut offset,
1166            Some(&mut dedup_map),
1167            &compressor,
1168            None,
1169        )
1170        .unwrap();
1171
1172        // Second block should reuse first block's offset
1173        assert_eq!(info1.offset, info2.offset);
1174        assert_eq!(info1.length, info2.length);
1175        assert_eq!(info1.checksum, info2.checksum);
1176
1177        // Offset should not advance (no write)
1178        assert_eq!(offset, offset_after_block1);
1179
1180        // Dedup map should have 1 entry (deduplicated)
1181        assert_eq!(dedup_map.len(), 1);
1182    }
1183
1184    #[test]
1185    fn test_write_block_with_encryption() {
1186        let mut output = Cursor::new(Vec::new());
1187        let mut offset = 512u64;
1188        let chunk = vec![0xAAu8; 4096];
1189        let compressor = Lz4Compressor::new();
1190
1191        // Create encryptor
1192        let salt = [0u8; 32];
1193        let encryptor = AesGcmEncryptor::new(b"test_password", &salt, 100000).unwrap();
1194
1195        let result = write_block_no_dedup(
1196            &mut output,
1197            &chunk,
1198            0,
1199            &mut offset,
1200            &compressor,
1201            Some(&encryptor),
1202        );
1203
1204        assert!(result.is_ok());
1205        let info = result.unwrap();
1206
1207        // Encrypted data should be larger than compressed (adds GCM tag)
1208        assert!(info.length > 16); // At least tag overhead
1209        assert_eq!(info.logical_len, 4096);
1210    }
1211
1212    #[test]
1213    fn test_write_block_encryption_disables_dedup() {
1214        let mut output = Cursor::new(Vec::new());
1215        let mut offset = 512u64;
1216        let mut dedup_map = StandardHashTable::new();
1217        let compressor = Lz4Compressor::new();
1218        let salt = [0u8; 32];
1219        let encryptor = AesGcmEncryptor::new(b"test_password", &salt, 100000).unwrap();
1220
1221        // Write first encrypted block
1222        let chunk1 = vec![0xAAu8; 4096];
1223        let info1 = write_block_simple(
1224            &mut output,
1225            &chunk1,
1226            0,
1227            &mut offset,
1228            Some(&mut dedup_map),
1229            &compressor,
1230            Some(&encryptor),
1231        )
1232        .unwrap();
1233
1234        let offset_after_block1 = offset;
1235
1236        // Write second encrypted block (same content, different nonce)
1237        let chunk2 = vec![0xAAu8; 4096];
1238        let info2 = write_block_simple(
1239            &mut output,
1240            &chunk2,
1241            1,
1242            &mut offset,
1243            Some(&mut dedup_map),
1244            &compressor,
1245            Some(&encryptor),
1246        )
1247        .unwrap();
1248
1249        // Both blocks should be written (no dedup with encryption)
1250        assert_eq!(info1.offset, 512);
1251        assert_eq!(info2.offset, offset_after_block1);
1252        assert!(offset > offset_after_block1);
1253
1254        // Dedup map should be empty (encryption disables dedup)
1255        assert_eq!(dedup_map.len(), 0);
1256    }
1257
1258    #[test]
1259    fn test_write_block_multiple_sequential() {
1260        let mut output = Cursor::new(Vec::new());
1261        let mut offset = 512u64;
1262        let compressor = Lz4Compressor::new();
1263
1264        let mut expected_offset = 512u64;
1265
1266        // Write 10 blocks sequentially
1267        for i in 0..10 {
1268            let chunk = vec![i as u8; 4096];
1269            let info = write_block_no_dedup(&mut output, &chunk, i, &mut offset, &compressor, None)
1270                .unwrap();
1271
1272            assert_eq!(info.offset, expected_offset);
1273            expected_offset += info.length as u64;
1274        }
1275
1276        assert_eq!(offset, expected_offset);
1277    }
1278
1279    #[test]
1280    fn test_write_block_preserves_logical_length() {
1281        let mut output = Cursor::new(Vec::new());
1282        let mut offset = 512u64;
1283        let compressor = Lz4Compressor::new();
1284
1285        for size in [128, 1024, 4096, 65536] {
1286            let chunk = vec![0xAAu8; size];
1287            let info = write_block_no_dedup(&mut output, &chunk, 0, &mut offset, &compressor, None)
1288                .unwrap();
1289
1290            assert_eq!(info.logical_len, size as u32);
1291        }
1292    }
1293
1294    #[test]
1295    fn test_write_block_checksum_differs() {
1296        let mut output1 = Cursor::new(Vec::new());
1297        let mut output2 = Cursor::new(Vec::new());
1298        let mut offset1 = 512u64;
1299        let mut offset2 = 512u64;
1300        let compressor = Lz4Compressor::new();
1301
1302        let chunk1 = vec![0xAAu8; 4096];
1303        let chunk2 = vec![0xBBu8; 4096];
1304
1305        let info1 = write_block_no_dedup(&mut output1, &chunk1, 0, &mut offset1, &compressor, None)
1306            .unwrap();
1307
1308        let info2 = write_block_no_dedup(&mut output2, &chunk2, 0, &mut offset2, &compressor, None)
1309            .unwrap();
1310
1311        // Different input data should produce different checksums
1312        assert_ne!(info1.checksum, info2.checksum);
1313    }
1314
1315    #[test]
1316    fn test_write_block_empty_chunk() {
1317        let mut output = Cursor::new(Vec::new());
1318        let mut offset = 512u64;
1319        let chunk: Vec<u8> = vec![];
1320        let compressor = Lz4Compressor::new();
1321
1322        let result = write_block_no_dedup(&mut output, &chunk, 0, &mut offset, &compressor, None);
1323
1324        // Should handle empty chunk
1325        assert!(result.is_ok());
1326        let info = result.unwrap();
1327        assert_eq!(info.logical_len, 0);
1328    }
1329
1330    #[test]
1331    fn test_write_block_large_block() {
1332        let mut output = Cursor::new(Vec::new());
1333        let mut offset = 512u64;
1334        let chunk = vec![0xAAu8; 1024 * 1024]; // 1 MB
1335        let compressor = Lz4Compressor::new();
1336
1337        let result = write_block_no_dedup(&mut output, &chunk, 0, &mut offset, &compressor, None);
1338
1339        assert!(result.is_ok());
1340        let info = result.unwrap();
1341        assert_eq!(info.logical_len, 1024 * 1024);
1342        // Highly compressible data should compress well
1343        assert!(info.length < info.logical_len);
1344    }
1345
1346    #[test]
1347    fn test_integration_zero_detection_and_write() {
1348        let mut output = Cursor::new(Vec::new());
1349        let mut offset = 512u64;
1350        let compressor = Lz4Compressor::new();
1351
1352        let zero_chunk = vec![0u8; 4096];
1353        let data_chunk = vec![0xAAu8; 4096];
1354
1355        // Process zero chunk
1356        let zero_info = if is_zero_chunk(&zero_chunk) {
1357            create_zero_block(zero_chunk.len() as u32)
1358        } else {
1359            write_block_no_dedup(&mut output, &zero_chunk, 0, &mut offset, &compressor, None)
1360                .unwrap()
1361        };
1362
1363        // Process data chunk
1364        let data_info = if is_zero_chunk(&data_chunk) {
1365            create_zero_block(data_chunk.len() as u32)
1366        } else {
1367            write_block_no_dedup(&mut output, &data_chunk, 1, &mut offset, &compressor, None)
1368                .unwrap()
1369        };
1370
1371        // Zero block should not be written
1372        assert_eq!(zero_info.offset, 0);
1373        assert_eq!(zero_info.length, 0);
1374
1375        // Data block should be written
1376        assert_eq!(data_info.offset, 512);
1377        assert!(data_info.length > 0);
1378    }
1379}
hexz_core/ops/write.rs

hexz_core/ops/
write.rs