structured-zstd 0.0.18

Pure Rust zstd implementation — managed fork of ruzstd. Dictionary decompression, no FFI.
Documentation
# structured-zstd

Pure Rust zstd implementation — managed fork of [ruzstd](https://github.com/KillingSpark/zstd-rs).

[![CI](https://github.com/structured-world/structured-zstd/actions/workflows/ci.yml/badge.svg)](https://github.com/structured-world/structured-zstd/actions/workflows/ci.yml)
[![Crates.io](https://img.shields.io/crates/v/structured-zstd.svg)](https://crates.io/crates/structured-zstd)
[![docs.rs](https://docs.rs/structured-zstd/badge.svg)](https://docs.rs/structured-zstd)
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)

## Benchmarks Dashboard

Historical benchmark charts are published to GitHub Pages:

- [Performance dashboard]https://structured-world.github.io/structured-zstd/dev/bench/
- [Latest relative payload]https://structured-world.github.io/structured-zstd/dev/bench/benchmark-relative.json
- [Latest benchmark delta report]https://structured-world.github.io/structured-zstd/dev/bench/benchmark-delta.md
- [Latest full benchmark report]https://structured-world.github.io/structured-zstd/dev/bench/benchmark-report.md

Note: the root Pages URL can be empty; benchmark charts live under `/dev/bench/`.

## Managed Fork

This is a **maintained fork** of [KillingSpark/zstd-rs](https://github.com/KillingSpark/zstd-rs) (ruzstd) by [Structured World Foundation](https://sw.foundation). We maintain additional features and hardening for the [CoordiNode](https://github.com/structured-world/coordinode) database engine.

**Fork goals:**
- Dictionary compression improvements (critical for per-label trained dictionaries in LSM-tree)
- Performance parity with C zstd for decompression (currently 1.4-3.5x slower)
- Full numeric compression levels (0 = default, 1–22 plus negative ultra-fast, with C zstd-compatible level numbering/API; not exact strategy/ratio parity at every level)
- No FFI — pure `cargo build`, no cmake/system libraries (ADR-013 compliance)

**Upstream relationship:** We periodically sync with upstream but maintain an independent development trajectory focused on CoordiNode requirements.

## What is this

A pure Rust implementation of the Zstandard compression format, as defined in [RFC 8878](https://www.rfc-editor.org/rfc/rfc8878.pdf).

This crate contains a fully operational decompressor and a compressor that is usable but does not yet match the speed, ratio, or configurability of the original C library.

## Current Status

### Decompression

Complete RFC 8878 implementation. Performance: ~1.4-3.5x slower than C zstd depending on data compressibility.

### Compression

- [x] Uncompressed blocks
- [x] Fastest (roughly level 1)
- [x] Default (roughly level 3)
- [x] Better (roughly level 7)
- [x] Best (roughly level 11)
- [x] Numeric levels `0` (default), `1–22`, and negative ultra-fast levels via `CompressionLevel::from_level(n)` (C zstd-compatible numbering)
- [x] Checksums
- [x] Frame Content Size — `FrameCompressor` writes FCS automatically; `StreamingEncoder` requires `set_pledged_content_size()` before first write
- [x] Dictionary compression
- [x] Streaming encoder (`io::Write`)

### Compression Strategy Coverage

Implemented strategy/back-end coverage:
- Level 1: simple matcher (`Simple`)
- Levels 2-3: `Dfast`
- Level 4: row matcher (`Row`)
- Levels 5-22: hash-chain matcher (`HashChain`) with lazy/lazy2 style tuning

Not yet implemented as dedicated strategy families:
- `greedy`
- `btopt`
- `btultra`

Current behavior for these missing families:
- numeric levels that require them are mapped to the closest implemented matcher configuration

### Dictionary Generation

When the `dict_builder` feature is enabled, the `dictionary` module can:
- build raw dictionaries with COVER (`create_raw_dict_from_source`)
- build raw dictionaries with FastCOVER (`create_fastcover_raw_dict_from_source`)
- finalize raw content into full zstd dictionary format (`finalize_raw_dict`)
- train+finalize in one pure-Rust flow (`create_fastcover_dict_from_source`)
- propagate I/O failures from dictionary-building APIs via `io::Result` return values

## Benchmarking

Performance tracking lives in [BENCHMARKS.md](BENCHMARKS.md). The suite compares `structured-zstd` against the C reference across small payloads, entropy extremes, a `100 MiB` large-stream scenario, repository corpus fixtures, and optional local Silesia corpora. CI benchmark runs are now published as a multi-target matrix (`x86_64-gnu`, `i686-gnu`, `x86_64-musl`) and expose a relative-first payload (`benchmark-relative.json`) for dashboard filtering by `target/stage/scenario/level/source`.

Benchmark report files are generated by `.github/scripts/run-benchmarks.sh` and are kept as ignored local/CI artifacts rather than tracked files in this repository.

## Usage

### Compression

```rust
use structured_zstd::encoding::{compress, compress_to_vec, CompressionLevel};

let data: &[u8] = b"hello world";
// Named level
let compressed = compress_to_vec(data, CompressionLevel::Fastest);
// Numeric level (C zstd compatible: 0 = default, 1-22, negative for ultra-fast)
let compressed = compress_to_vec(data, CompressionLevel::from_level(7));
```

```rust,no_run
use structured_zstd::encoding::{CompressionLevel, StreamingEncoder};
use std::io::Write;

let mut out = Vec::new();
let mut encoder = StreamingEncoder::new(&mut out, CompressionLevel::Fastest);
encoder.write_all(b"hello ")?;
encoder.write_all(b"world")?;
encoder.finish()?;
# Ok::<(), std::io::Error>(())
```

### Decompression

```rust,no_run
use structured_zstd::decoding::StreamingDecoder;
use structured_zstd::io::Read;

let compressed_data: Vec<u8> = vec![];
let mut source: &[u8] = &compressed_data;
let mut decoder = StreamingDecoder::new(&mut source).unwrap();

let mut result = Vec::new();
decoder.read_to_end(&mut result).unwrap();
```

### Dictionary-backed Decompression API

```rust,no_run
use structured_zstd::decoding::{DictionaryHandle, FrameDecoder, StreamingDecoder};
use structured_zstd::io::Read;

let compressed: Vec<u8> = vec![];
let dict_bytes: Vec<u8> = vec![];
let mut output = vec![0u8; 1024];

// Parse dictionary once, then reuse handle.
let handle = DictionaryHandle::decode_dict(&dict_bytes).unwrap();
let mut decoder = FrameDecoder::new();
let _written = decoder
    .decode_all_with_dict_handle(compressed.as_slice(), &mut output, &handle)
    .unwrap();

// Compatibility path: pass raw dictionary bytes directly.
let mut decoder = FrameDecoder::new();
let _written = decoder
    .decode_all_with_dict_bytes(compressed.as_slice(), &mut output, &dict_bytes)
    .unwrap();

// Streaming helpers exist for both handle- and bytes-based paths.
let mut source: &[u8] = &compressed;
let mut stream = StreamingDecoder::new_with_dictionary_handle(&mut source, &handle).unwrap();
let mut sink = Vec::new();
stream.read_to_end(&mut sink).unwrap();
```

## Support the Project

<div align="center">

![USDT TRC-20 Donation QR Code](https://raw.githubusercontent.com/structured-world/structured-zstd/main/assets/usdt-qr.svg)

USDT (TRC-20): `TFDsezHa1cBkoeZT5q2T49Wp66K8t2DmdA`

</div>

## License

Apache License 2.0

Contributions will be published under the same Apache 2.0 license.