Struct Compressor

Source
pub struct Compressor { /* private fields */ }
Expand description

A compressor that uses a symbol table to greedily compress strings.

The Compressor is the central component of FSST. You can create a compressor either by default (i.e. an empty compressor), or by training it on an input corpus of text.

Example usage:

use fsst::{Symbol, Compressor, CompressorBuilder};
let compressor = {
    let mut builder = CompressorBuilder::new();
    builder.insert(Symbol::from_slice(&[b'h', b'e', b'l', b'l', b'o', 0, 0, 0]), 5);
    builder.build()
};

let compressed = compressor.compress("hello".as_bytes());
assert_eq!(compressed, vec![0u8]);

Implementations§

Source§

impl Compressor

Source

pub fn train(values: &Vec<&[u8]>) -> Self

Build and train a Compressor from a sample corpus of text.

This function implements the generational algorithm described in the FSST paper Section 4.3. Starting with an empty symbol table, it iteratively compresses the corpus, then attempts to merge symbols when doing so would yield better compression than leaving them unmerged. The resulting table will have at most 255 symbols (the 256th symbol is reserved for the escape code).

Source§

impl Compressor

The core structure of the FSST codec, holding a mapping between Symbols and Codes.

The symbol table is trained on a corpus of data in the form of a single byte array, building up a mapping of 1-byte “codes” to sequences of up to 8 plaintext bytes, or “symbols”.

Source

pub unsafe fn compress_word( &self, word: u64, out_ptr: *mut u8, ) -> (usize, usize)

Using the symbol table, runs a single cycle of compression on an input word, writing the output into out_ptr.

§Returns

This function returns a tuple of (advance_in, advance_out) with the number of bytes for the caller to advance the input and output pointers.

advance_in is the number of bytes to advance the input pointer before the next call.

advance_out is the number of bytes to advance out_ptr before the next call.

§Safety

out_ptr must never be NULL or otherwise point to invalid memory.

Source

pub fn compress_bulk(&self, lines: &Vec<&[u8]>) -> Vec<Vec<u8>>

Compress many lines in bulk.

Source

pub unsafe fn compress_into(&self, plaintext: &[u8], values: &mut Vec<u8>)

Compress a string, writing its result into a target buffer.

The target buffer is a byte vector that must have capacity large enough to hold the encoded data.

When this call returns, values will hold the compressed bytes and have its length set to the length of the compressed text.

use fsst::{Compressor, CompressorBuilder, Symbol};

let mut compressor = CompressorBuilder::new();
assert!(compressor.insert(Symbol::from_slice(b"aaaaaaaa"), 8));

let compressor = compressor.build();

let mut compressed_values = Vec::with_capacity(1_024);

// SAFETY: we have over-sized compressed_values.
unsafe {
    compressor.compress_into(b"aaaaaaaa", &mut compressed_values);
}

assert_eq!(compressed_values, vec![0u8]);
§Safety

It is up to the caller to ensure the provided buffer is large enough to hold all encoded data.

Source

pub fn compress(&self, plaintext: &[u8]) -> Vec<u8>

Use the symbol table to compress the plaintext into a sequence of codes and escapes.

Source

pub fn decompressor(&self) -> Decompressor<'_>

Access the decompressor that can be used to decompress strings emitted from this Compressor instance.

Source

pub fn symbol_table(&self) -> &[Symbol]

Returns a readonly slice of the current symbol table.

The returned slice will have length of n_symbols.

Source

pub fn symbol_lengths(&self) -> &[u8]

Returns a readonly slice where index i contains the length of the symbol represented by code i.

Values range from 1-8.

Source

pub fn rebuild_from( symbols: impl AsRef<[Symbol]>, symbol_lens: impl AsRef<[u8]>, ) -> Self

Rebuild a compressor from an existing symbol table.

This will not attempt to optimize or re-order the codes.

Trait Implementations§

Source§

impl Clone for Compressor

Source§

fn clone(&self) -> Compressor

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.