pub struct Compressor { /* private fields */ }
Expand description
A compressor that uses a symbol table to greedily compress strings.
The Compressor
is the central component of FSST. You can create a compressor either by
default (i.e. an empty compressor), or by training it on an input corpus of text.
Example usage:
use fsst::{Symbol, Compressor, CompressorBuilder};
let compressor = {
let mut builder = CompressorBuilder::new();
builder.insert(Symbol::from_slice(&[b'h', b'e', b'l', b'l', b'o', 0, 0, 0]), 5);
builder.build()
};
let compressed = compressor.compress("hello".as_bytes());
assert_eq!(compressed, vec![0u8]);
Implementations§
Source§impl Compressor
impl Compressor
Sourcepub fn train(values: &Vec<&[u8]>) -> Self
pub fn train(values: &Vec<&[u8]>) -> Self
Build and train a Compressor
from a sample corpus of text.
This function implements the generational algorithm described in the FSST paper Section 4.3. Starting with an empty symbol table, it iteratively compresses the corpus, then attempts to merge symbols when doing so would yield better compression than leaving them unmerged. The resulting table will have at most 255 symbols (the 256th symbol is reserved for the escape code).
Source§impl Compressor
The core structure of the FSST codec, holding a mapping between Symbol
s and Code
s.
impl Compressor
The core structure of the FSST codec, holding a mapping between Symbol
s and Code
s.
The symbol table is trained on a corpus of data in the form of a single byte array, building up a mapping of 1-byte “codes” to sequences of up to 8 plaintext bytes, or “symbols”.
Sourcepub unsafe fn compress_word(
&self,
word: u64,
out_ptr: *mut u8,
) -> (usize, usize)
pub unsafe fn compress_word( &self, word: u64, out_ptr: *mut u8, ) -> (usize, usize)
Using the symbol table, runs a single cycle of compression on an input word, writing
the output into out_ptr
.
§Returns
This function returns a tuple of (advance_in, advance_out) with the number of bytes for the caller to advance the input and output pointers.
advance_in
is the number of bytes to advance the input pointer before the next call.
advance_out
is the number of bytes to advance out_ptr
before the next call.
§Safety
out_ptr
must never be NULL or otherwise point to invalid memory.
Sourcepub unsafe fn compress_into(&self, plaintext: &[u8], values: &mut Vec<u8>)
pub unsafe fn compress_into(&self, plaintext: &[u8], values: &mut Vec<u8>)
Compress a string, writing its result into a target buffer.
The target buffer is a byte vector that must have capacity large enough to hold the encoded data.
When this call returns, values
will hold the compressed bytes and have
its length set to the length of the compressed text.
use fsst::{Compressor, CompressorBuilder, Symbol};
let mut compressor = CompressorBuilder::new();
assert!(compressor.insert(Symbol::from_slice(b"aaaaaaaa"), 8));
let compressor = compressor.build();
let mut compressed_values = Vec::with_capacity(1_024);
// SAFETY: we have over-sized compressed_values.
unsafe {
compressor.compress_into(b"aaaaaaaa", &mut compressed_values);
}
assert_eq!(compressed_values, vec![0u8]);
§Safety
It is up to the caller to ensure the provided buffer is large enough to hold all encoded data.
Sourcepub fn compress(&self, plaintext: &[u8]) -> Vec<u8> ⓘ
pub fn compress(&self, plaintext: &[u8]) -> Vec<u8> ⓘ
Use the symbol table to compress the plaintext into a sequence of codes and escapes.
Sourcepub fn decompressor(&self) -> Decompressor<'_>
pub fn decompressor(&self) -> Decompressor<'_>
Access the decompressor that can be used to decompress strings emitted from this
Compressor
instance.
Sourcepub fn symbol_table(&self) -> &[Symbol]
pub fn symbol_table(&self) -> &[Symbol]
Returns a readonly slice of the current symbol table.
The returned slice will have length of n_symbols
.
Sourcepub fn symbol_lengths(&self) -> &[u8] ⓘ
pub fn symbol_lengths(&self) -> &[u8] ⓘ
Returns a readonly slice where index i
contains the
length of the symbol represented by code i
.
Values range from 1-8.
Trait Implementations§
Source§impl Clone for Compressor
impl Clone for Compressor
Source§fn clone(&self) -> Compressor
fn clone(&self) -> Compressor
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more