Skip to main content

HuffmanEncoder

Struct HuffmanEncoder 

Source
pub struct HuffmanEncoder { /* private fields */ }
Expand description

Optimized Huffman encoder for literal compression.

Implementations§

Source§

impl HuffmanEncoder

Source

pub fn build(data: &[u8]) -> Option<Self>

Build a Huffman encoder from literal data.

Uses SIMD-accelerated histogram when available. Returns None if data cannot be efficiently Huffman-compressed.

Source

pub fn from_weights(weights: &[u8]) -> Option<Self>

Build a Huffman encoder from pre-defined weights.

This allows using custom Huffman tables instead of building from data. Useful when you have pre-trained weights from dictionary compression or want to reuse weights across multiple blocks.

§Parameters
  • weights: Array of 256 weights (one per byte value). Weight 0 means symbol is not present. Weight w > 0 means code_length = max_bits + 1 - w.
§Returns

Returns Some(encoder) if the weights are valid, None otherwise.

§Example
use haagenti_zstd::huffman::HuffmanEncoder;

// Define weights for symbols 'a' (97), 'b' (98), 'c' (99)
let mut weights = vec![0u8; 256];
weights[97] = 3;  // 'a' - highest weight (shortest code)
weights[98] = 2;  // 'b' - medium weight
weights[99] = 1;  // 'c' - lowest weight (longest code)

let encoder = HuffmanEncoder::from_weights(&weights).unwrap();
Source

pub fn encode(&self, literals: &[u8]) -> Vec<u8>

Encode literals using optimized bit packing.

Uses 64-bit accumulator for efficient byte-aligned writes. Optimized with chunked reverse processing and software prefetching to maintain cache efficiency despite reverse iteration requirement.

§Performance Optimizations
  • Processes in 64-byte cache-line chunks (reverse chunk order, forward within chunk)
  • Software prefetching brings next chunk into L1 cache ahead of time
  • 64-bit accumulator with branchless 32-bit flushes
  • Unrolled inner loop for better ILP
Source

pub fn encode_batch(&self, literals: &[u8]) -> Vec<u8>

Encode literals in batches for better throughput.

Processes 4 symbols at a time when possible.

Source

pub fn serialize_weights(&self) -> Vec<u8>

Serialize weights in Zstd format (direct or FSE-compressed).

For num_symbols <= 128: Uses direct format

  • header_byte = (num_symbols - 1) + 128
  • Followed by ceil(num_symbols / 2) bytes of 4-bit weights

For num_symbols > 128: Uses FSE-compressed format

  • header_byte < 128 = compressed_size
  • Followed by FSE table and compressed weights
Source

pub fn max_bits(&self) -> u8

Get maximum code length.

Source

pub fn num_symbols(&self) -> usize

Get number of symbols with codes.

Source

pub fn estimate_size(&self, literals: &[u8]) -> usize

Estimate compressed size.

Trait Implementations§

Source§

impl Debug for HuffmanEncoder

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.