Skip to main content

Crate lz4

Crate lz4 

Source
Expand description

Pure-Rust translation of the LZ4 compression library, exposing the same public API as the lz4-rs crate so it can be used as a drop-in replacement (use lz4::...).

The crate provides:

  • Encoder / EncoderBuilder and DecoderWrite/Read streaming wrappers around the LZ4 frame format.
  • block — a safe block-mode API (compress / decompress) modelled on python-lz4, with optional uncompressed-size prefix.
  • liblz4 — error helpers and the re-exported low-level sys surface.
  • sys — the C-shaped LZ4 block, HC, frame, and streaming functions translated into Rust.

See the crate README for the full design rationale and parity status.

§lz4-pure-rs

Pure Rust LZ4 library with the same public API as lz4-rs.

Translated from upstream LZ4 commit 9da37b2eebf082bfab6e57c49be71cc41119a40d.

  • 2026-05-26: new audit. minor semantic drift fixed. performance improved, about on par. RSS is higher and need a future audit
  • 2026-05-16: docstrings added
  • 2026-04-25: On par with speed of original LZ4 on the generated benchmark corpus. Passes a broad fixture suite and matches upstream output on the tested corpus/API paths. More testing should however be done

§This is an LLM-mediated faithful (hopefully) translation, not the original code!

Most users should probably first see if the existing original code works for them, unless they have reason otherwise. The original source may have newer features and it has had more love in terms of fixing bugs. In fact, we aim to replicate bugs if they are present, for the sake of reproducibility! (but then we might have added a few more in the process)

There are however cases when you might prefer this Rust version. We generally agree with this manifesto but more specifically:

  • We have had many issues with ensuring that our software works using existing containers (Docker, PodMan, Singularity). One size does not fit all and it eats our resources trying to keep up with every way of delivering software
  • Common package managers do not work well. It was great when we had a few Linux distributions with stable procedures, but now there are just too many ecosystems (Homebrew, Conda). Conda has an NP-complete resolver which does not scale. Homebrew is only so-stable. And our dependencies in Python still break. These can no longer be considered professional serious options. Meanwhile, Cargo enables multiple versions of packages to be available, even within the same program(!)
  • The future is the web. We deploy software in the web browser, and until now that has meant Javascript. This is a language where even the == operator is broken. Typescript is one step up, but a game changer is the ability to compile Rust code into webassembly, enabling performance and sharing of code with the backend. Translating code to Rust enables new ways of deployment and running code in the browser has especial benefits for science - researchers do not have deep pockets to run servers, so pushing compute to the user enables deployment that otherwise would be impossible
  • Old CLI-based utilities are bad for the environment(!). A large amount of compute resources are spent creating and communicating via small files, which we can bypass by using code as libraries. Even better, we can avoid frequent reloading of databases by hoisting this stage, with up to 100x speedups in some cases. Less compute means faster compute and less electricity wasted
  • LLM-mediated translations may actually be safer to use than the original code. This article shows that running the same code on different operating systems can give somewhat different answers. This is a gap that Rust+Cargo can reduce. Typesafe interfaces also reduce coding mistakes and error handling, as opposed to typical command-line scripting

But:

  • This approach should still be considered experimental. The LLM technology is immature and has sharp corners. But there are opportunities to reap, and the genie is not going back into the bottle. This translation is as much aimed to learn how to improve the technology and get feedback on the results.
  • Translations are not endorsed by the original authors unless otherwise noted. Do not send bug reports to the original developers. Use our Github issues page instead.
  • Treat the benchmarks on this page as reproducible spot checks, not universal guarantees. They are used to help evaluate the translation on the generated corpus. If you want improved performance, you generally have to use this code as a library, and use the additional tricks it offers. We generally accept performance losses in order to reduce our dependency issues
  • Check the original Github pages for information about the package. This README is kept sparse on purpose. It is not meant to be the primary source of information
  • If you are the author of the original code and wish to move to Rust, you can obtain ownership of this repository and crate. Until then, our commitment is to offer an as-faithful-as-possible translation of a snapshot of your code. If we find serious bugs, we will report them to you. Otherwise we will just replicate them, to ensure comparability across studies that claim to use package XYZ v.666. Think of this like a fancy Ubuntu .deb-package of your software - that is how we treat it

This blurb might be out of date. Go to this page for the latest information and further information about how we approach translation

§API

The crate currently implements the lz4-rs safe block, encoder, and decoder APIs on top of a pure Rust translation. It also exposes the C-shaped sys surface used by those APIs, including block compression/decompression, streaming state APIs, frame APIs, and LZ4HC entry points.

§CLI

The optional cli feature builds a single lz4 binary using clap:

cargo run --features cli --bin lz4 -- -f input output.lz4
cargo run --features cli --bin lz4 -- -d output.lz4 restored

§Testing

Frame and block output is format-compatible with upstream LZ4 on the tested paths. The test suite includes byte fixtures generated from upstream C for fast compression, dictionary compression, frame compression, and HC compression, plus negative frame tests for malformed headers, checksums, content-size mismatches, oversized block headers, linked blocks, and skippable frames.

tools/lz4_perf_check.sh compares the release CLI against the installed system lz4 on generated random, zero-filled, source-like, JSON/log-like, FASTA-like, dictionary-heavy, binary-artifact, tar/many-small-file, and already-compressed samples. The default check byte-compares Rust and system compressed frames for default compression across every corpus input and for HC9-HC12 on source-repeat, then validates both implementations can decode each other’s output. Set LZ4_PURE_PARITY_SWEEP=1 to additionally byte-compare default plus levels 1 through 12 across the full corpus (Rust -l N, system -N). As of April 25, 2026, that sweep matched system lz4 1.9.4 for every tested corpus input and level.

§Speed vs original lz4 1.9.4 (3-run median)

Wall-clock from LZ4_PURE_PERF_RUNS=3 tools/lz4_perf_check.sh, run on May 26, 2026, after the const-generic HC/frame/fast-path specialization work. The comparison target is the installed original C implementation, lz4 1.9.4. Compressed sizes are byte-identical to system at every level shown. The speed column is system time / Rust time: values above 1.00x mean Rust was faster, values below 1.00x mean Rust was slower. Several entries are short enough that one scheduler tick changes the result, so treat single-digit differences as parity.

Compression:

InputSizeRustSystemRust speed vs system
random6464 MiB0.24 s0.24 s1.00x
zeros6464 MiB0.03 s0.05 s1.67x faster
source-repeat107 MiB0.39 s0.40 s1.03x faster
loglike22 MiB0.04 s0.04 s1.00x
fasta-like43 MiB0.04 s0.04 s1.00x
dictionary-heavy43 MiB0.07 s0.06 s0.86x, 1.17x slower
binary-artifact1.6 MiB0.01 s0.01 s1.00x
many-small.tar2.0 MiB0.01 s0.01 s1.00x
already-compressed29 MiB0.13 s0.13 s1.00x
HC level 9107 MiB4.05 s4.30 s1.06x faster
HC level 10107 MiB5.57 s5.50 s0.99x, 1.01x slower
HC level 11107 MiB12.04 s12.26 s1.02x faster
HC level 12107 MiB11.65 s12.21 s1.05x faster

Decompression (decoding a system-generated .lz4):

InputRustSystemRust speed vs system
random640.08 s0.08 s1.00x
zeros640.07 s0.09 s1.29x faster
source-repeat0.16 s0.17 s1.06x faster
loglike0.03 s0.04 s1.33x faster
fasta-like0.05 s0.05 s1.00x
dictionary-heavy0.06 s0.06 s1.00x
binary-artifact0.01 s0.01 s1.00x
many-small.tar0.01 s0.01 s1.00x
already-compressed0.05 s0.05 s1.00x

§Peak RSS vs original lz4 1.9.4

Peak resident set size from /usr/bin/time -f %M, single run on the same generated files. RSS is reported in MiB. This measures CLI process memory, not library-only allocator pressure, and is more sensitive to libc/kernel behavior than the timing table.

Compression RSS:

InputRustSystemΔ
random6410.6 MiB9.7 MiB+10%
zeros646.6 MiB5.6 MiB+17%
source-repeat10.9 MiB6.6 MiB+67%
loglike8.8 MiB5.9 MiB+47%
fasta-like9.7 MiB5.6 MiB+72%
dictionary-heavy9.7 MiB5.9 MiB+63%
binary-artifact6.2 MiB4.1 MiB+52%
many-small.tar6.5 MiB3.4 MiB+90%
already-compressed11.3 MiB9.7 MiB+16%
HC level 910.6 MiB6.6 MiB+62%
HC level 1010.6 MiB6.6 MiB+62%
HC level 1110.0 MiB6.6 MiB+52%
HC level 1210.3 MiB6.6 MiB+57%

Decompression RSS (decoding a system-generated .lz4):

InputRustSystemΔ
random6410.3 MiB1.6 MiB+560%
zeros646.3 MiB5.6 MiB+11%
source-repeat8.4 MiB6.9 MiB+23%
loglike7.5 MiB6.3 MiB+20%
fasta-like6.6 MiB5.6 MiB+17%
dictionary-heavy7.5 MiB5.9 MiB+26%
binary-artifact5.9 MiB4.1 MiB+46%
many-small.tar4.4 MiB3.8 MiB+17%
already-compressed10.3 MiB1.6 MiB+560%

In short: Rust is at parity or faster than original lz4 1.9.4 on most speed checks in this corpus. The largest measured Rust wins are 1.67x faster compression on zeros64, 1.33x faster decompression on loglike, 1.29x faster decompression on zeros64, and 1.06x faster HC9 compression. The slowest Rust case here is dictionary-heavy compression at 0.86x system speed, or 1.17x slower. The main cost is memory: the Rust CLI currently uses a few more MiB of RSS than the original C CLI, most visibly on the raw/incompressible decompression cases where the system binary stays near 1.6 MiB.

§License

License is the same as the LZ4 library, i.e. it is provided as open-source software using BSD 2-Clause license.

Re-exports§

pub use crate::liblz4::version;
pub use crate::liblz4::BlockMode;
pub use crate::liblz4::BlockSize;
pub use crate::liblz4::ContentChecksum;

Modules§

block
This module provides access to the block mode functions of the lz4 C library. It somehow resembles the Python-lz4 api, but using Rust’s Option type, the function parameters have been a little simplified. As does python-lz4, this module supports prepending the compressed buffer with a u32 value representing the size of the original, uncompressed data.
liblz4
Re-exports of the low-level crate::sys surface along with thin Rust error helpers used by the higher-level wrappers in this crate.
sys

Structs§

Decoder
Streaming LZ4 frame decoder wrapping an inner Read source.
Encoder
Streaming LZ4 frame encoder wrapping an inner Write sink.
EncoderBuilder
Builder for Encoder that collects LZ4 frame preferences before constructing the streaming encoder.