pub struct ZstdCompressor { /* private fields */ }Expand description
Zstandard compressor with optional pre-trained dictionary.
This compressor wraps the zstd crate and provides both raw compression
(no dictionary) and dictionary-enhanced compression for improved ratios on
structured data.
§Dictionary Lifecycle
When a dictionary is provided:
- The dictionary bytes are cloned and leaked to obtain
'staticlifetime - Both encoder and decoder dictionaries are constructed from the leaked bytes
- The dictionary memory persists for the process lifetime
- Multiple compressor instances can share the same dictionary bytes
This design trades memory (leaked dictionary) for simplicity and safety. In typical Hexz usage, one compressor instance exists per snapshot file, so the overhead is ~450 KB per open snapshot (110 KB dict × ~4x internal structures).
§Thread Safety
ZstdCompressor is Send + Sync. Compression and decompression operations do not
mutate the compressor state, allowing safe concurrent use from multiple threads.
Each operation allocates its own temporary encoder/decoder.
§Constraints
- Dictionary compatibility: Blocks compressed with a dictionary MUST be decompressed with the exact same dictionary bytes. Attempting to decompress with a different or missing dictionary will fail with a compression error.
- Level consistency: The compression level is stored in the encoder dictionary. Changing the level requires training a new dictionary.
§Examples
use hexz_core::algo::compression::{Compressor, zstd::ZstdCompressor};
// Create compressor without dictionary
let compressor = ZstdCompressor::new(3, None);
let data = b"test data";
let compressed = compressor.compress(data).unwrap();
let decompressed = compressor.decompress(&compressed).unwrap();
assert_eq!(data.as_slice(), decompressed.as_slice());Implementations§
Source§impl ZstdCompressor
impl ZstdCompressor
Sourcepub fn new(level: i32, dict: Option<Vec<u8>>) -> Self
pub fn new(level: i32, dict: Option<Vec<u8>>) -> Self
Creates a new Zstandard compressor with the specified compression level and optional dictionary.
§Parameters
-
level- Compression level from 1 to 22:1: Fastest compression (~450 MB/s), lower ratio3: Default, balanced speed/ratio (~350 MB/s)9: High compression (~85 MB/s), good ratio19-22: Maximum compression (~10-30 MB/s), best ratio
-
dict- Optional pre-trained dictionary bytes:None: Use raw zstd compressionSome(dict_bytes): Use dictionary-enhanced compression
§Dictionary Handling
When a dictionary is provided, this function:
- Copies the dictionary bytes internally via
EncoderDictionary::copy - Parses the bytes into native zstd encoder/decoder dictionaries
- Manages the dictionary lifetime automatically (no leaks)
The dictionary memory (~110 KB for typical dictionaries) is properly managed and freed when the compressor is dropped. This provides:
- Proper memory management without leaks
- Dictionary reuse across millions of blocks
- Memory safety with automatic cleanup
§Memory Usage
Approximate memory overhead per compressor instance:
- No dictionary: ~10 KB (minimal bookkeeping)
- With 110 KB dictionary: ~450 KB (leaked bytes + encoder/decoder structures)
§Examples
use hexz_core::algo::compression::zstd::ZstdCompressor;
// Fast compression, no dictionary
let fast = ZstdCompressor::new(1, None);
// Balanced compression with dictionary
let dict = vec![0u8; 1024]; // Placeholder dictionary
let balanced = ZstdCompressor::new(3, Some(dict));
// Maximum compression for archival
let max = ZstdCompressor::new(22, None);§Performance Notes
Creating a compressor is relatively expensive (~1 ms with dictionary due to parsing). Reuse compressor instances rather than creating them per-operation.
Sourcepub fn train(samples: &[Vec<u8>], max_size: usize) -> Result<Vec<u8>>
pub fn train(samples: &[Vec<u8>], max_size: usize) -> Result<Vec<u8>>
Trains a Zstandard dictionary from representative sample blocks.
Dictionary training analyzes a collection of sample data to identify common patterns, sequences, and statistical distributions. The resulting dictionary acts as a “seed” for the compressor, enabling better compression ratios on small blocks that would otherwise lack sufficient data to build effective models.
§Training Algorithm
The training process:
- Concatenates all samples into a training corpus
- Analyzes byte-level patterns using suffix arrays and frequency analysis
- Selects the most valuable patterns up to
max_sizebytes - Optimizes dictionary layout for fast lookup during compression
- Returns the trained dictionary as a byte vector
This is a CPU-intensive operation (O(n log n) where n is total sample bytes) and should be done once during snapshot creation, not per-block.
§Parameters
-
samples- A slice of representative data blocks. Requirements:- Minimum count: 10 samples (20+ recommended, 50+ ideal)
- Minimum total size: 100x
max_size(e.g., 10 MB for 100 KB dictionary) - Representativeness: Must match production data patterns
- Diversity: Include variety of structures, not just repeated copies
-
max_size- Maximum dictionary size in bytes. Recommendations:- Small blocks (16-32 KB): 64 KB dictionary
- Medium blocks (64 KB): 110 KB dictionary (zstd’s recommended max)
- Large blocks (128+ KB): Diminishing returns, consider skipping dictionary
§Returns
Returns Ok(Vec<u8>) containing the trained dictionary bytes, or Err if training fails.
The actual dictionary size may be less than max_size if fewer patterns were found.
§Errors
Returns Error::Compression if:
- Samples are empty or too small (less than ~1 KB total)
max_sizeis invalid (0 or excessively large)- Internal zstd training algorithm fails (corrupted samples, out of memory)
§Performance Characteristics
Training time on AMD Ryzen 9 5950X:
- 1 MB samples, 64 KB dict: ~50 ms
- 10 MB samples, 110 KB dict: ~200 ms
- 100 MB samples, 110 KB dict: ~2 seconds
Training is approximately O(n log n) in total sample size.
§Compression Ratio Impact
Expected compression ratio improvements with trained dictionary vs. raw compression:
| Block Size | Raw Zstd-3 | With Dict | Improvement |
|---|---|---|---|
| 16 KB | 1.5x | 2.4x | +60% |
| 32 KB | 2.1x | 3.2x | +52% |
| 64 KB | 2.8x | 3.9x | +39% |
| 128 KB | 3.2x | 3.7x | +16% |
| 256 KB+ | 3.5x | 3.6x | +3% |
Measured on typical VM disk image blocks (ext4 filesystem data).
§Examples
§Basic Training
use hexz_core::algo::compression::zstd::ZstdCompressor;
// Collect 20 representative 64 KB blocks
let samples: Vec<Vec<u8>> = (0..20)
.map(|i| vec![((i * 13) % 256) as u8; 65536])
.collect();
// Train 110 KB dictionary
let dict = ZstdCompressor::train(&samples, 110 * 1024)?;
println!("Trained dictionary: {} bytes", dict.len());
// Use dictionary for compression
let compressor = ZstdCompressor::new(3, Some(dict));§Training from File Samples
use hexz_core::algo::compression::zstd::ZstdCompressor;
use std::fs::File;
use std::io::Read;
// Read samples from disk (e.g., sampled from VM disk image)
let mut samples = Vec::new();
let mut file = File::open("disk.raw")?;
let file_size = file.metadata()?.len();
let block_size = 65536;
let sample_count = 50;
let step = file_size / sample_count;
for i in 0..sample_count {
let mut buffer = vec![0u8; block_size];
let offset = i * step;
// Seek to offset and read block
// (simplified, real code needs error handling)
samples.push(buffer);
}
let dict = ZstdCompressor::train(&samples, 110 * 1024)?;
println!("Trained from {} samples: {} bytes", samples.len(), dict.len());§Validating Dictionary Quality
use hexz_core::algo::compression::{Compressor, zstd::ZstdCompressor};
let samples: Vec<Vec<u8>> = vec![vec![42u8; 32768]; 30];
let dict = ZstdCompressor::train(&samples, 64 * 1024)?;
// Compare compression ratios
let without_dict = ZstdCompressor::new(3, None);
let with_dict = ZstdCompressor::new(3, Some(dict));
let test_data = vec![42u8; 32768];
let compressed_raw = without_dict.compress(&test_data)?;
let compressed_dict = with_dict.compress(&test_data)?;
let improvement = (compressed_raw.len() as f64 / compressed_dict.len() as f64 - 1.0) * 100.0;
println!("Dictionary improved compression by {:.1}%", improvement);
// If improvement < 10%, dictionary may not be beneficial§When Dictionary Training Fails or Performs Poorly
Dictionary training may produce poor results if:
- Samples are unrepresentative: Training on zeros, compressing real data
- Data is random/encrypted: No patterns exist to learn
- Samples are too few: Less than 10 samples or less than 100x dict size
- Data is already highly compressible: Text/logs may not benefit
- Blocks are too large: 256 KB+ blocks have enough context without dictionary
If dictionary compression performs worse than raw compression, fall back to
ZstdCompressor::new(level, None).
§Memory Usage During Training
Temporary memory allocated during training:
- Input buffer: Sum of all sample sizes (e.g., 10 MB for 50 × 200 KB samples)
- Working memory: ~10x
max_size(e.g., ~1.1 MB for 110 KB dict) - Output dictionary:
max_size(e.g., 110 KB)
Total peak memory: input_size + 10×max_size. For typical usage (10 MB samples, 110 KB dict), peak memory is ~12 MB.
Trait Implementations§
Source§impl Compressor for ZstdCompressor
impl Compressor for ZstdCompressor
Source§fn compress(&self, data: &[u8]) -> Result<Vec<u8>>
fn compress(&self, data: &[u8]) -> Result<Vec<u8>>
Compresses a block of data using Zstandard compression.
This method compresses data using the compression level and dictionary
configured during construction. The output is a self-contained compressed
block in zstd frame format.
§Parameters
data- The uncompressed input data to compress. Can be any size from 0 bytes to multiple gigabytes, though blocks of 64 KB to 1 MB are typical in Hexz.
§Returns
Returns Ok(Vec<u8>) containing the compressed data. The compressed size depends on:
- Input data compressibility (random data: ~100%, structured data: 20-50%)
- Compression level (higher levels = smaller output, slower compression)
- Dictionary usage (can reduce output by 10-40% for small blocks)
§Errors
Returns Error::Compression if:
- Internal zstd encoder initialization fails (rare, typically OOM)
- Compression process fails (extremely rare with valid input)
§Dictionary Behavior
- With dictionary: Uses streaming encoder with pre-parsed dictionary for maximum throughput. The dictionary is not embedded in the output; the decompressor must have the same dictionary.
- Without dictionary: Uses simple one-shot encoding with zstd’s default dictionary learning from the input itself.
§Performance
Approximate throughput on modern hardware (AMD Ryzen 9 5950X):
- Level 1: ~450 MB/s
- Level 3: ~350 MB/s (default)
- Level 9: ~85 MB/s
- Level 19: ~28 MB/s
Dictionary overhead: ~5% slower than raw compression due to initialization.
§Examples
use hexz_core::algo::compression::{Compressor, zstd::ZstdCompressor};
let compressor = ZstdCompressor::new(3, None);
let data = b"Hello, world! Compression test data.";
let compressed = compressor.compress(data).unwrap();
println!("Compressed {} bytes to {} bytes", data.len(), compressed.len());
// Compressed data is self-contained and can be stored/transmitted§Thread Safety
This method can be called concurrently from multiple threads on the same
ZstdCompressor instance. Each call creates an independent encoder.
Source§fn compress_into(&self, data: &[u8], out: &mut Vec<u8>) -> Result<()>
fn compress_into(&self, data: &[u8], out: &mut Vec<u8>) -> Result<()>
Decompresses a Zstandard-compressed block into a new buffer.
This method reverses the compression performed by compress(), restoring the
original uncompressed data. The decompressed output is allocated dynamically
based on the compressed frame’s metadata.
§Parameters
data- The compressed input data in zstd frame format. Must have been compressed by a compatibleZstdCompressor(same dictionary, any level).
§Returns
Returns Ok(Vec<u8>) containing the decompressed data. The output size is
determined by the compressed frame’s content size field (embedded during
compression).
§Errors
Returns Error::Compression if:
datais not valid zstd-compressed data (corrupted or wrong format)datawas compressed with a dictionary, but this compressor has no dictionarydatawas compressed without a dictionary, but this compressor has a dictionarydatawas compressed with a different dictionary than this compressor- Internal decompression fails (checksum mismatch, corrupted data)
§Dictionary Compatibility
Critical: Dictionary-compressed data MUST be decompressed with the exact same dictionary bytes. The zstd format includes a dictionary ID checksum; mismatched dictionaries will cause decompression to fail with an error.
| Compressed With | Decompressed With | Result |
|---|---|---|
| No dictionary | No dictionary | Success |
| Dictionary A | Dictionary A | Success |
| No dictionary | Dictionary A | Error |
| Dictionary A | No dictionary | Error |
| Dictionary A | Dictionary B | Error |
§Performance
Decompression speed is independent of compression level (level affects only compression time). Typical throughput on modern hardware:
- Without dictionary: ~1100 MB/s
- With dictionary: ~950 MB/s (10% overhead from dictionary lookups)
Decompression is roughly 3x faster than compression at level 3.
§Memory Allocation
This method allocates a new Vec<u8> to hold the decompressed output. For
hot paths where the decompressed size is known, consider using decompress_into()
to reuse buffers and avoid allocations.
§Examples
use hexz_core::algo::compression::{Compressor, zstd::ZstdCompressor};
let compressor = ZstdCompressor::new(3, None);
let original = b"Test data for compression";
let compressed = compressor.compress(original).unwrap();
let decompressed = compressor.decompress(&compressed).unwrap();
assert_eq!(original.as_slice(), decompressed.as_slice());§Thread Safety
This method can be called concurrently from multiple threads on the same
ZstdCompressor instance. Each call creates an independent decoder.
Source§fn decompress_into(&self, data: &[u8], out: &mut [u8]) -> Result<usize>
fn decompress_into(&self, data: &[u8], out: &mut [u8]) -> Result<usize>
Decompresses a Zstandard-compressed block into a caller-provided buffer.
This is a zero-allocation variant of decompress() that writes decompressed
data directly into a pre-allocated buffer. This is ideal for hot paths where
the decompressed size is known and buffers can be reused across multiple
decompression operations.
§Parameters
data- The compressed input data in zstd frame format. Must have been compressed by a compatibleZstdCompressor(same dictionary, any level).out- The output buffer to receive decompressed bytes. Must be large enough to hold the entire decompressed payload.
§Returns
Returns Ok(usize) containing the number of bytes written to out. This is
always ≤ out.len().
§Errors
Returns Error::Compression if:
datais not valid zstd-compressed data- Dictionary mismatch (same rules as
decompress()) outis too small to hold the decompressed data (buffer overflow protection)- Internal decompression fails (checksum mismatch, corrupted data)
§Buffer Sizing
The output buffer must be large enough to hold the full decompressed payload. If the buffer is too small, decompression will fail with an error rather than truncating output.
To determine the required size:
- If you compressed the data, you know the original size
- If reading from Hexz snapshots, the block size is in the index
- The zstd frame header contains the content size (can be parsed)
§Performance
This method avoids heap allocation of the output buffer, making it suitable for high-throughput scenarios:
- With reused buffer: 0 allocations per decompression
- Throughput: Same as
decompress()(~1000 MB/s) - Latency: ~5% lower than
decompress()due to eliminated allocation
Recommended usage pattern for hot paths:
use hexz_core::algo::compression::{Compressor, zstd::ZstdCompressor};
let compressor = ZstdCompressor::new(3, None);
let original = vec![42u8; 65536]; // 64 KB data
let compressed = compressor.compress(&original).unwrap();
let mut reusable_buffer = vec![0u8; 65536]; // 64 KB buffer
// Reuse buffer for multiple decompressions
for _ in 0..1000 {
let size = compressor.decompress_into(&compressed, &mut reusable_buffer).unwrap();
// Process reusable_buffer[..size]
}§Examples
§Basic Usage
use hexz_core::algo::compression::{Compressor, zstd::ZstdCompressor};
let compressor = ZstdCompressor::new(3, None);
let original = vec![42u8; 1024];
let compressed = compressor.compress(&original).unwrap();
// Decompress into pre-allocated buffer
let mut output = vec![0u8; 1024];
let size = compressor.decompress_into(&compressed, &mut output).unwrap();
assert_eq!(size, 1024);
assert_eq!(output, original);§Buffer Too Small
use hexz_core::algo::compression::{Compressor, zstd::ZstdCompressor};
let compressor = ZstdCompressor::new(3, None);
let original = vec![42u8; 1024];
let compressed = compressor.compress(&original).unwrap();
// Provide insufficient buffer
let mut small_buffer = vec![0u8; 512];
let result = compressor.decompress_into(&compressed, &mut small_buffer);
// Result depends on zstd behavior with undersized buffers§Thread Safety
This method can be called concurrently from multiple threads on the same
ZstdCompressor instance, provided each thread uses its own output buffer.
Source§impl Debug for ZstdCompressor
impl Debug for ZstdCompressor
Source§fn fmt(&self, f: &mut Formatter<'_>) -> Result
fn fmt(&self, f: &mut Formatter<'_>) -> Result
Formats the compressor for debugging output.
Displays the compression level and whether a dictionary is present, without exposing sensitive dictionary contents.
§Output Format
ZstdCompressor { level: 3, has_dict: true }§Examples
use hexz_core::algo::compression::zstd::ZstdCompressor;
let compressor = ZstdCompressor::new(5, None);
println!("{:?}", compressor);
// Outputs: ZstdCompressor { level: 5, has_dict: false }Auto Trait Implementations§
impl Freeze for ZstdCompressor
impl RefUnwindSafe for ZstdCompressor
impl Send for ZstdCompressor
impl Sync for ZstdCompressor
impl Unpin for ZstdCompressor
impl UnsafeUnpin for ZstdCompressor
impl UnwindSafe for ZstdCompressor
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more