pub struct HuffmanEncoder { /* private fields */ }Expand description
Optimized Huffman encoder for literal compression.
Implementations§
Source§impl HuffmanEncoder
impl HuffmanEncoder
Sourcepub fn build(data: &[u8]) -> Option<Self>
pub fn build(data: &[u8]) -> Option<Self>
Build a Huffman encoder from literal data.
Uses SIMD-accelerated histogram when available. Returns None if data cannot be efficiently Huffman-compressed.
Sourcepub fn from_weights(weights: &[u8]) -> Option<Self>
pub fn from_weights(weights: &[u8]) -> Option<Self>
Build a Huffman encoder from pre-defined weights.
This allows using custom Huffman tables instead of building from data. Useful when you have pre-trained weights from dictionary compression or want to reuse weights across multiple blocks.
§Parameters
weights: Array of 256 weights (one per byte value). Weight 0 means symbol is not present. Weight w > 0 means code_length = max_bits + 1 - w.
§Returns
Returns Some(encoder) if the weights are valid, None otherwise.
§Example
use haagenti_zstd::huffman::HuffmanEncoder;
// Define weights for symbols 'a' (97), 'b' (98), 'c' (99)
let mut weights = vec![0u8; 256];
weights[97] = 3; // 'a' - highest weight (shortest code)
weights[98] = 2; // 'b' - medium weight
weights[99] = 1; // 'c' - lowest weight (longest code)
let encoder = HuffmanEncoder::from_weights(&weights).unwrap();Sourcepub fn encode(&self, literals: &[u8]) -> Vec<u8> ⓘ
pub fn encode(&self, literals: &[u8]) -> Vec<u8> ⓘ
Encode literals using optimized bit packing.
Uses 64-bit accumulator for efficient byte-aligned writes. Optimized with chunked reverse processing and software prefetching to maintain cache efficiency despite reverse iteration requirement.
§Performance Optimizations
- Processes in 64-byte cache-line chunks (reverse chunk order, forward within chunk)
- Software prefetching brings next chunk into L1 cache ahead of time
- 64-bit accumulator with branchless 32-bit flushes
- Unrolled inner loop for better ILP
Sourcepub fn encode_batch(&self, literals: &[u8]) -> Vec<u8> ⓘ
pub fn encode_batch(&self, literals: &[u8]) -> Vec<u8> ⓘ
Encode literals in batches for better throughput.
Processes 4 symbols at a time when possible.
Sourcepub fn serialize_weights(&self) -> Vec<u8> ⓘ
pub fn serialize_weights(&self) -> Vec<u8> ⓘ
Serialize weights in Zstd format (direct or FSE-compressed).
For num_symbols <= 128: Uses direct format
- header_byte = (num_symbols - 1) + 128
- Followed by ceil(num_symbols / 2) bytes of 4-bit weights
For num_symbols > 128: Uses FSE-compressed format
- header_byte < 128 = compressed_size
- Followed by FSE table and compressed weights
Sourcepub fn num_symbols(&self) -> usize
pub fn num_symbols(&self) -> usize
Get number of symbols with codes.
Sourcepub fn estimate_size(&self, literals: &[u8]) -> usize
pub fn estimate_size(&self, literals: &[u8]) -> usize
Estimate compressed size.