Skip to main content

HuffmanEncoder

haagenti_zstd::huffman

Struct HuffmanEncoder

pub struct HuffmanEncoder { /* private fields */ }

Expand description

Optimized Huffman encoder for literal compression.

Implementations§

impl HuffmanEncoder

pub fn build(data: &[u8]) -> Option<Self>

Build a Huffman encoder from literal data.

Uses SIMD-accelerated histogram when available. Returns None if data cannot be efficiently Huffman-compressed.

pub fn from_weights(weights: &[u8]) -> Option<Self>

Build a Huffman encoder from pre-defined weights.

This allows using custom Huffman tables instead of building from data. Useful when you have pre-trained weights from dictionary compression or want to reuse weights across multiple blocks.

§Parameters

weights: Array of 256 weights (one per byte value). Weight 0 means symbol is not present. Weight w > 0 means code_length = max_bits + 1 - w.

§Returns

Returns Some(encoder) if the weights are valid, None otherwise.

§Example

use haagenti_zstd::huffman::HuffmanEncoder;

// Define weights for symbols 'a' (97), 'b' (98), 'c' (99)
let mut weights = vec![0u8; 256];
weights[97] = 3;  // 'a' - highest weight (shortest code)
weights[98] = 2;  // 'b' - medium weight
weights[99] = 1;  // 'c' - lowest weight (longest code)

let encoder = HuffmanEncoder::from_weights(&weights).unwrap();

pub fn encode(&self, literals: &[u8]) -> Vec<u8> ⓘ

Encode literals using optimized bit packing.

Uses 64-bit accumulator for efficient byte-aligned writes. Optimized with chunked reverse processing and software prefetching to maintain cache efficiency despite reverse iteration requirement.

§Performance Optimizations

Processes in 64-byte cache-line chunks (reverse chunk order, forward within chunk)
Software prefetching brings next chunk into L1 cache ahead of time
64-bit accumulator with branchless 32-bit flushes
Unrolled inner loop for better ILP

pub fn encode_batch(&self, literals: &[u8]) -> Vec<u8> ⓘ

Encode literals in batches for better throughput.

Processes 4 symbols at a time when possible.

pub fn serialize_weights(&self) -> Vec<u8> ⓘ

Serialize weights in Zstd format (direct or FSE-compressed).

For num_symbols <= 128: Uses direct format

header_byte = (num_symbols - 1) + 128
Followed by ceil(num_symbols / 2) bytes of 4-bit weights

For num_symbols > 128: Uses FSE-compressed format

header_byte < 128 = compressed_size
Followed by FSE table and compressed weights

pub fn max_bits(&self) -> u8

Get maximum code length.

pub fn num_symbols(&self) -> usize

Get number of symbols with codes.

pub fn estimate_size(&self, literals: &[u8]) -> usize

Estimate compressed size.

Trait Implementations§

impl Debug for HuffmanEncoder

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

impl Freeze for HuffmanEncoder

impl RefUnwindSafe for HuffmanEncoder

impl Send for HuffmanEncoder

impl Sync for HuffmanEncoder

impl Unpin for HuffmanEncoder

impl UnwindSafe for HuffmanEncoder

Blanket Implementations§

impl<T> Any for T
where T: 'static + ?Sized,

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

impl<T> Borrow<T> for T
where T: ?Sized,

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

impl<T> BorrowMut<T> for T
where T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

impl<T> From<T> for T

fn from(t: T) -> T

Returns the argument unchanged.

impl<T, U> Into<U> for T
where U: From<T>,

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

impl<T, U> TryFrom<U> for T
where U: Into<T>,

type Error = Infallible

The type returned in the event of a conversion error.

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.