Struct FrameDecoder

Source

pub struct FrameDecoder { /* private fields */ }

Expand description

Low level Zstandard decoder that can be used to decompress frames with fine control over when and how many bytes are decoded.

This decoder is able to decode frames only partially and gives control over how many bytes/blocks will be decoded at a time (so you don’t have to decode a 10GB file into memory all at once). It reads bytes as needed from a provided source and can be read from to collect partial results.

If you want to just read the whole frame with an io::Read without having to deal with manually calling FrameDecoder::decode_blocks you can use the provided crate::decoding::StreamingDecoder wich wraps this FrameDecoder.

Workflow is as follows:

use structured_zstd::decoding::BlockDecodingStrategy;

use std::io::{Read, Write};

// no_std environments can use the crate's own Read traits
use structured_zstd::io::{Read, Write};

fn decode_this(mut file: impl Read) {
    //Create a new decoder
    let mut frame_dec = structured_zstd::decoding::FrameDecoder::new();
    let mut result = Vec::new();

    // Use reset or init to make the decoder ready to decode the frame from the io::Read
    frame_dec.reset(&mut file).unwrap();

    // Loop until the frame has been decoded completely
    while !frame_dec.is_finished() {
        // decode (roughly) batch_size many bytes
        frame_dec.decode_blocks(&mut file, BlockDecodingStrategy::UptoBytes(1024)).unwrap();

        // read from the decoder to collect bytes from the internal buffer
        let bytes_read = frame_dec.read(result.as_mut_slice()).unwrap();

        // then do something with it
        do_something(&result[0..bytes_read]);
    }

    // handle the last chunk of data
    while frame_dec.can_collect() > 0 {
        let x = frame_dec.read(result.as_mut_slice()).unwrap();

        do_something(&result[0..x]);
    }
}

fn do_something(data: &[u8]) {
    std::io::stdout().write_all(data).unwrap();
}

Implementations§

Source §

impl FrameDecoder

Source

pub fn new() -> FrameDecoder ⓘ

This will create a new decoder without allocating anything yet. init()/reset() will allocate all needed buffers if it is the first time this decoder is used else they just reset these buffers with not further allocations

Source

pub fn expect_dict_id(&mut self, expected: Option<u32>)

Available on crate feature lsm only.

Pin the expected Dictionary_ID for the next frame.

When expected is set, Self::init / Self::reset validate it against the parsed frame header BEFORE any block decode work runs. A mismatch returns crate::decoding::errors::FrameDecoderError::UnexpectedDictId before any block decode and before any output is produced. Scratch buffer allocation / reservation for the decode pipeline happens during frame-header parsing, which is already complete when this validation fires — the cost of scratch sizing is paid even on a mismatched header. The guarantee is “no block decode, no XXH64 init, no partial output”, not “zero allocation”.

Some(0) is treated as “no dictionary expected”: a frame whose header omits the optional Dictionary_ID field (flag value 0) passes the check; a frame that carries an explicit non-zero id fails.

None (default) disables the check.

Primary use case: post-AEAD-decrypt sanity check in wire-format consumers (e.g. lsm-tree’s encrypted block format pins the dict_id baked into the AAD against the inner zstd frame’s dict_id to defeat dict-substitution attacks).

NOT a replacement for AEAD authentication. NOT the same semantic as donor ZSTD_d_windowLogMax (which is a ceiling-style limit, separate concern).

Source

pub fn expect_window_descriptor(&mut self, expected: Option<u8>)

Available on crate feature lsm only.

Pin the expected raw Window_Descriptor byte (RFC 8878 §3.1.1.1.2 layout: (exp << 3) | mantissa) for the next frame.

When expected is set, Self::init / Self::reset validate it against the parsed frame header BEFORE any block decode work runs. A mismatch returns crate::decoding::errors::FrameDecoderError::UnexpectedWindowDescriptor.

Single-segment frames omit the Window_Descriptor byte from the wire entirely. Setting an expectation while receiving a single-segment frame fails the check with found: None — there is no on-wire byte to match against, which is reported explicitly rather than silently passing.

None (default) disables the check.

Byte-exact equality, NOT a ceiling. Donor ZSTD_d_windowLogMax is a separate ceiling-style limit available through the C FFI surface; this method is for strict equality validation against a pinned expectation (e.g. lsm-tree’s wire format pins the window descriptor from the AAD to defeat decompression-bomb-swap attacks).

Source

pub fn set_magicless(&mut self, magicless: bool)

Enable or disable magicless frame format (ZSTD_f_zstd1_magicless). When set to true, subsequent [init] / [reset] calls expect the frame header to begin directly with the frame-header descriptor — no 4-byte magic number prefix. Default false. Must match the encoder’s magicless setting; the format is unambiguous only when the caller knows it out-of-band.

Note: magicless mode also disables skippable-frame detection. The 0x184D2A50..=0x184D2A5F skippable-frame magic range is only recognised when the 4-byte magic prefix is consumed, so decode_all / init / reset will treat a skippable frame at the head of a magicless stream as a malformed frame header (bad descriptor / window-size error) instead of skipping it. Mixed-format streams that interleave skippable frames must be pre-split by the caller; set_magicless(true) is only safe when the entire stream is known to be magicless zstd frames.

Source

pub fn init(&mut self, source: impl Read) -> Result<(), FrameDecoderError>

init() will allocate all needed buffers if it is the first time this decoder is used else they just reset these buffers with not further allocations

Note that all bytes currently in the decodebuffer from any previous frame will be lost. Collect them with collect()/collect_to_writer()

equivalent to reset()

Source

pub fn init_with_dict_handle( &mut self, source: impl Read, dict: &DictionaryHandle, ) -> Result<(), FrameDecoderError>

Initialize the decoder for a new frame using a pre-parsed dictionary handle.

If the frame header has a dictionary ID, this validates it against dict.id() and returns FrameDecoderError::DictIdMismatch on mismatch.

If the header omits the optional dictionary ID, this still applies the provided dictionary handle.

§Warning

This method always applies dict unless the frame header contains a non-matching dictionary ID. Callers must only use this API when they already know the frame was encoded with the provided dictionary, even if the frame header omits the dictionary ID or encodes an explicit dictionary ID of 0.

Passing a dictionary for a frame that was not encoded with it can silently corrupt the decoded output.

Source

pub fn reset(&mut self, source: impl Read) -> Result<(), FrameDecoderError>

reset() will allocate all needed buffers if it is the first time this decoder is used else they just reset these buffers with not further allocations

Note that all bytes currently in the decodebuffer from any previous frame will be lost. Collect them with collect()/collect_to_writer()

equivalent to init()

Source

pub fn reset_with_dict_handle( &mut self, source: impl Read, dict: &DictionaryHandle, ) -> Result<(), FrameDecoderError>

Reset this decoder for a new frame using a pre-parsed dictionary handle.

If the frame header has a dictionary ID, this validates it against dict.id() and returns FrameDecoderError::DictIdMismatch on mismatch.

If the header omits the optional dictionary ID, this still applies the provided dictionary handle.

§Warning

This method always applies dict unless the frame header contains a non-matching dictionary ID. Callers must only use this API when they already know the frame was encoded with the provided dictionary, even if the frame header omits the dictionary ID or encodes an explicit dictionary ID of 0.

Passing a dictionary for a frame that was not encoded with it can silently corrupt the decoded output.

Source

pub fn add_dict(&mut self, dict: Dictionary) -> Result<(), FrameDecoderError>

Add a dictionary that can be selected dynamically by frame dictionary ID.

Returns FrameDecoderError::DictAlreadyRegistered if the ID is already registered (either as owned or shared).

Source

pub fn add_dict_from_bytes( &mut self, raw_dictionary: &[u8], ) -> Result<(), FrameDecoderError>

Parse and add a serialized dictionary blob.

Source

pub fn add_dict_handle( &mut self, dict: DictionaryHandle, ) -> Result<(), FrameDecoderError>

Available on target_has_atomic=ptr only.

Add a pre-parsed dictionary handle for reuse across decoders.

This API is available on targets with pointer-width atomics (target_has_atomic = "ptr").

Returns FrameDecoderError::DictAlreadyRegistered if the ID is already registered (either as owned or shared).

Source

pub fn force_dict(&mut self, dict_id: u32) -> Result<(), FrameDecoderError>

Source

pub fn content_size(&self) -> u64

Returns how many bytes the frame contains after decompression

Source

pub fn get_checksum_from_data(&self) -> Option<u32>

Returns the checksum that was read from the data. Only available after all bytes have been read. It is the last 4 bytes of a zstd-frame

Source

pub fn get_calculated_checksum(&self) -> Option<u32>

Available on crate feature hash only.

Returns the checksum that was calculated while decoding. Only a sensible value after all decoded bytes have been collected/read from the FrameDecoder

Source

pub fn bytes_read_from_source(&self) -> u64

Counter for how many bytes have been consumed while decoding the frame

Source

pub fn is_finished(&self) -> bool

Whether the current frames last block has been decoded yet If this returns true you can call the drain* functions to get all content (the read() function will drain automatically if this returns true)

Source

pub fn blocks_decoded(&self) -> usize

Counter for how many blocks have already been decoded

Source

pub fn decode_blocks( &mut self, source: impl Read, strat: BlockDecodingStrategy, ) -> Result<bool, FrameDecoderError>

Decodes blocks from a reader. It requires that the framedecoder has been initialized first. The Strategy influences how many blocks will be decoded before the function returns This is important if you want to manage memory consumption carefully. If you don’t care about that you can just choose the strategy “All” and have all blocks of the frame decoded into the buffer

Source

pub fn collect(&mut self) -> Option<Vec<u8>>

Collect bytes and retain window_size bytes while decoding is still going on. After decoding of the frame (is_finished() == true) has finished it will collect all remaining bytes

Source

pub fn collect_to_writer(&mut self, w: impl Write) -> Result<usize, Error>

Collect bytes and retain window_size bytes while decoding is still going on. After decoding of the frame (is_finished() == true) has finished it will collect all remaining bytes

Source

pub fn can_collect(&self) -> usize

How many bytes can currently be collected from the decodebuffer, while decoding is going on this will be lower than the actual decodbuffer size because window_size bytes need to be retained for decoding. After decoding of the frame (is_finished() == true) has finished it will report all remaining bytes

Source

pub fn decode_from_to( &mut self, source: &[u8], target: &mut [u8], ) -> Result<(usize, usize), FrameDecoderError>

Decodes as many blocks as possible from the source slice and reads from the decodebuffer into the target slice The source slice may contain only parts of a frame but must contain at least one full block to make progress

By all means use decode_blocks if you have a io.Reader available. This is just for compatibility with other decompressors which try to serve an old-style c api

Returns (read, written), if read == 0 then the source did not contain a full block and further calls with the same input will not make any progress!

Note that no kind of block can be bigger than 128kb. So to be safe use at least 128*1024 (max block content size) + 3 (block_header size) + 18 (max frame_header size) bytes as your source buffer

You may call this function with an empty source after all bytes have been decoded. This is equivalent to just call decoder.read(&mut target)

Source

pub fn decode_all( &mut self, input: &[u8], output: &mut [u8], ) -> Result<usize, FrameDecoderError>

Decode multiple frames into the output slice.

input must contain an exact number of frames. Skippable frames are allowed and will be skipped during decode.

output must be large enough to hold the decompressed data. If you don’t know how large the output will be, use FrameDecoder::decode_blocks instead.

This calls FrameDecoder::init, and all bytes currently in the decoder will be lost.

Returns the number of bytes written to output.

Source

pub fn decode_all_with_dict_handle( &mut self, input: &[u8], output: &mut [u8], dict: &DictionaryHandle, ) -> Result<usize, FrameDecoderError>

Decode multiple frames into the output slice using a pre-parsed dictionary handle.

input must contain an exact number of frames. Skippable frames are allowed and will be skipped during decode.

output must be large enough to hold the decompressed data. If you don’t know how large the output will be, use FrameDecoder::decode_blocks instead.

This calls FrameDecoder::init_with_dict_handle, and all bytes currently in the decoder will be lost.

§Warning

Each decoded frame is initialized with dict, even when a frame header omits the optional dictionary ID. Callers must only use this API when they already know the input frames were encoded with the provided dictionary; otherwise decoded output can be silently corrupted.

Source

pub fn decode_all_with_dict_bytes( &mut self, input: &[u8], output: &mut [u8], raw_dictionary: &[u8], ) -> Result<usize, FrameDecoderError>

Decode multiple frames into the output slice using a serialized dictionary.

§Warning

Each decoded frame is initialized with the parsed dictionary, even when a frame header omits the optional dictionary ID. Callers must only use this API when they already know the input frames were encoded with that dictionary; otherwise decoded output can be silently corrupted.

Source

pub fn decode_all_to_vec( &mut self, input: &[u8], output: &mut Vec<u8>, ) -> Result<(), FrameDecoderError>

Decode multiple frames into the extra capacity of the output vector.

input must contain an exact number of frames.

output must have enough extra capacity to hold the decompressed data. This function will not reallocate or grow the vector. If you don’t know how large the output will be, use FrameDecoder::decode_blocks instead.

This calls FrameDecoder::init, and all bytes currently in the decoder will be lost.

The length of the output vector is updated to include the decompressed data. The length is not changed if an error occurs.

Source

pub fn decode_to_slice_trusted( &mut self, input: &[u8], output: &mut [u8], ) -> Result<usize, FrameDecoderError>

Decode a single zstd frame from input directly into output, bypassing the internal DecodeBuffer -> read() drain copy when the frame is eligible. Donor parity with the ZSTD_in_dst litBuffer placement strategy.

Eligibility requires all of:

frame_content_size is present in the header (> 0).
output.len() >= frame_content_size + WILDCOPY_OVERLENGTH (room for the SIMD wildcopy overshoot slack).
No active dictionary on self.state (dict_content is not carried into the stack-local DecodeBuffer this method builds).

content_checksum_flag is NOT a disqualifier: when set, the direct path hashes the decoded output[..content_size] once at the end of decode and propagates the digest into the persistent scratch’s hash so Self::get_calculated_checksum returns the right value.

Multi-segment frames are supported via a per-block DecodeBuffer::drop_to_window_size call that caps the visible buffer at window_size at block boundaries. The discarded bytes stay physically in the user slice (they’re the frame’s already-decoded output); only their BufferBackend::head visibility moves forward.

Note: drop_to_window_size runs only BETWEEN blocks, so within a single block buffer.len() can temporarily exceed window_size. DecodeBuffer::repeat validates match offsets against buffer.len() (not against window_size), so corrupted streams with offset > window_size but offset <= current buffer.len() are NOT rejected by this gate. Strict spec compliance for offsets in multi-segment frames would require an in-block offset bound that we don’t currently enforce on either the direct or the fallback path.

Non-eligible frames fall back transparently to the existing decode_blocks + read drain path.

input is expected to contain a single zstd frame. Bytes past the end of that frame are NOT validated and are silently ignored — this differs from Self::decode_all, which loops until input is fully consumed and will attempt to parse a second frame (or error) on trailing bytes. Multi-frame streams must use Self::decode_all.

On the direct path the literal pushes and sequence-execution match copies write straight into output, eliminating the FlatBuf-as-intermediate read() drain that dominates poorly-compressed L-7-class corpora (~28% of decode time on level_-7_fast/decodecorpus-z000033/rust_stream). Both single-segment and multi-segment frames take the direct path; multi-segment frames cap the visible buffer at window_size between blocks via DecodeBuffer::drop_to_window_size.

Frames that aren’t eligible (zero frame_content_size, active dictionary, undersized output) transparently fall back to the internal block-decode + read drain loop. The fallback is NOT Self::decode_all semantics: it decodes exactly one frame and returns; trailing bytes past the frame are silently ignored. Use Self::decode_all for multi-frame input or streams that may contain skippable frames.

input is expected to contain exactly ONE non-skippable zstd frame. Skippable frames are rejected with ReadFrameHeaderError::SkipFrame from init — this method does NOT skip them. Multi-frame input or input that might contain skippable frames must go through Self::decode_all, which iterates init and handles SkipFrame by advancing past the skippable payload.

§State observability after this call

On the direct path, decoded bytes are written into output via a stack-local DecodeBuffer<UserSliceBackend> that is dropped before this function returns. The persistent state.decoder_scratch.buffer stays empty. Consequently, after decode_to_slice_trusted returns:

Self::is_finished returns true,
Self::can_collect returns 0,
Self::read (the crate’s io::Read impl, which under feature = "std" is std::io::Read) reads 0 bytes,
Self::collect returns Some(Vec::new()),
Self::get_calculated_checksum returns the correct value when the frame had content_checksum_flag set — the direct path walks the output once at end of decode and propagates the digest into the persistent scratch’s hasher so this accessor reads the right state.

Callers must use the bytes from output[..n] (where n is the returned count); do not mix decode_to_slice_trusted with read/collect on the same FrameDecoder.

When the frame is NOT eligible (no FCS in the header, or output buffer too small for the WILDCOPY slack, or active dictionary), this method falls back to a single-frame decode_blocks + read drain loop, draining into the caller’s output slice. This is NOT decode_all: it processes only one frame (no trailing-frame iteration, no silent skippable-frame skip) and returns FrameDecoderError::TargetTooSmall if the decoded output does not fit in output.

§Panic / DoS surface

For trusted input only. On the direct path UserSliceBackend uses release-mode assert! for capacity checks across all three write entry points (extend, extend_and_fill, extend_from_within_unchecked). A malformed Compressed block whose payload expands past the declared frame_content_size (and beyond the WILDCOPY_OVERLENGTH slack the caller sized into output) will panic mid-block rather than returning a structured error. The per-block produced > content_size guard catches the overshoot AFTER the block, but cannot prevent the in-block writes from running first.

The trade-off is deliberate for this PR. Making the writes fallible requires extending the BufferBackend trait surface, touching every backend implementation, and propagating Result<_, _> through the entire sequence executor — a refactor too large to fold into the direct decode wiring without losing review tractability. The follow-up issue tracking that work (referenced below in “Fallible BufferBackend writes”) is a hard prerequisite before this entry point becomes safe to expose on untrusted streams.

Callers handling untrusted input must use Self::decode_all which routes through FlatBuf / RingBuffer. Those backends grow via Vec::reserve (succeeds or aborts on alloc failure — not error-returning), but the growable Vec capacity absorbs a malformed block’s overshoot inside the allocation; the frame-level checks then turn the size mismatch into FrameContentSizeMismatch instead of OOB writes into a fixed-size user slice. Fallible BufferBackend writes that would let decode_to_slice_trusted remain safe on adversarial input are tracked in issue #246.

Trait Implementations§

Source §

impl Default for FrameDecoder

Source §

fn default() -> Self

Returns the “default value” for a type. Read more

Source §

impl Read for FrameDecoder

Read bytes from the decode_buffer that are no longer needed. While the frame is not yet finished this will retain window_size bytes, else it will drain it completely

Source §