blosc2-pure-rs
A pure Rust implementation of the Blosc2 high-performance compression library, providing both a CLI tool and a library API.
Blosc2 is a block-oriented compressor optimized for binary data such as numerical arrays, tensors, and structured formats. It applies a filter pipeline (shuffle, bitshuffle, delta) before compression to exploit data patterns, then compresses with one of several codecs.
The library is feature complete except for one edge case (get in touch if this is a problem). The speed is more or less comparable to the C implementation (benchmarks below).
- 2026-04-27: Speed is now broadly comparable to, or faster than, C-Blosc2 on the default benchmark workload.
- 2026-04-22: Ready for testing, passing current battery of tests. But be vigilant that errors may still remain; report if possible
This is an LLM-mediated faithful (hopefully) translation, not the original code!
Most users should probably first see if the existing original code works for them, unless they have reason otherwise. The original source may have newer features and it has had more love in terms of fixing bugs. In fact, we aim to replicate bugs if they are present, for the sake of reproducibility! (but then we might have added a few more in the process)
There are however cases when you might prefer this Rust version. We generally agree with this manifesto but more specifically:
- We have had many issues with ensuring that our software works using existing containers (Docker, PodMan, Singularity). One size does not fit all and it eats our resources trying to keep up with every way of delivering software
- Common package managers do not work well. It was great when we had a few Linux distributions with stable procedures, but now there are just too many ecosystems (Homebrew, Conda). Conda has an NP-complete resolver which does not scale. Homebrew is only so-stable. And our dependencies in Python still break. These can no longer be considered professional serious options. Meanwhile, Cargo enables multiple versions of packages to be available, even within the same program(!)
- The future is the web. We deploy software in the web browser, and until now that has meant Javascript. This is a language where even the == operator is broken. Typescript is one step up, but a game changer is the ability to compile Rust code into webassembly, enabling performance and sharing of code with the backend. Translating code to Rust enables new ways of deployment and running code in the browser has especial benefits for science - researchers do not have deep pockets to run servers, so pushing compute to the user enables deployment that otherwise would be impossible
- Old CLI-based utilities are bad for the environment(!). A large amount of compute resources are spent creating and communicating via small files, which we can bypass by using code as libraries. Even better, we can avoid frequent reloading of databases by hoisting this stage, with up to 100x speedups in some cases. Less compute means faster compute and less electricity wasted
- LLM-mediated translations may actually be safer to use than the original code. This article shows that running the same code on different operating systems can give somewhat different answers. This is a gap that Rust+Cargo can reduce. Typesafe interfaces also reduce coding mistakes and error handling, as opposed to typical command-line scripting
But:
- This approach should still be considered experimental. The LLM technology is immature and has sharp corners. But there are opportunities to reap, and the genie is not going back into the bottle. This translation is as much aimed to learn how to improve the technology and get feedback on the results.
- Translations are not endorsed by the original authors unless otherwise noted. Do not send bug reports to the original developers. Use our Github issues page instead.
- Treat the benchmarks on this page as local measurements, not universal truths. They are used to evaluate the translation on one machine and compiler setup. If performance matters for your workload, benchmark your own data and call patterns.
- Check the original Github pages for information about the package. This README is kept sparse on purpose. It is not meant to be the primary source of information
- If you are the author of the original code and wish to move to Rust, you can obtain ownership of this repository and crate. Until then, our commitment is to offer an as-faithful-as-possible translation of a snapshot of your code. If we find serious bugs, we will report them to you. Otherwise we will just replicate them, to ensure comparability across studies that claim to use package XYZ v.666. Think of this like a fancy Ubuntu .deb-package of your software - that is how we treat it
This blurb might be out of date. Go to this page for the latest information and further information about how we approach translation
Features
- 5 codecs: BloscLZ (ported from C), LZ4, LZ4HC, Zlib, Zstd — all pure Rust
- 4 filters: Shuffle, Bitshuffle, Delta, Truncated Precision
- Frame format: Compatible with C-Blosc2
.b2framefiles (read and write) - Lazy frame reads: File-backed
LazySchunkloads compressed chunks on demand - VL-block chunks: Pure-Rust variable-length block chunks with split/block decompression
- Multi-threaded: Bounded per-call Rayon scheduling for block-level and super-chunk chunk-level work
- Zstd dictionaries: Per-chunk dictionary training with C/Rust-compatible dictionary chunks
- CLI: Compress and decompress files (optional
clifeature) - Library API: In-memory compression with
Schunkcontainer
Current Limitations
- B2ND metadata serialization supports up to 15 dimensions. 16-D arrays are extremely uncommon and are out of scope for now.
Installation
Package name on crates.io: blosc2-pure-rs
Library crate name in Rust code: blosc2_pure_rs
CLI binary name: blosc2 (enable the cli feature)
# Library dependency
# CLI tool
CLI Usage
Compress
Options:
-c, --codec: Compression codec (blosclz,lz4,lz4hc,zlib,zstd). Default:blosclz-l, --clevel: Compression level (0-9). Default:9-t, --typesize: Element type size in bytes. Default:1-b, --blocksize: Explicit block size in bytes (0= automatic). Default:0--chunksize: Input bytes per frame chunk. Default:4194304(4 MiB).-s, --splitmode: Split mode (always,never,auto,forward). Default:forward-n, --nthreads: Number of threads. Default:4-f, --filter: Filter (nofilter,shuffle,bitshuffle,delta,truncprec). Default:shuffle--filter-meta: Filter metadata byte. Fortruncprec, this is the retained precision in bits. Default:0
Chunk-size guidance: keep the default for general file compression unless you have workload-specific measurements showing a better setting.
Decompress
Verify roundtrip
Library Usage
Compress and decompress a buffer
use ;
use *;
let data: =
.flat_map
.collect;
let cparams = CParams ;
let chunk = compress.unwrap;
let restored = decompress.unwrap;
assert_eq!;
Reuse an output buffer for fast decompression
For hot decompression paths, especially when chunks are effectively stored rather than compressed, prefer the destination-buffer API so the caller owns the output allocation:
use ;
use *;
let data: =
.flat_map
.collect;
let cparams = CParams ;
let chunk = compress.unwrap;
let mut restored = vec!;
let written = decompress_into.unwrap;
assert_eq!;
assert_eq!;
let written = decompress_into_with_threads.unwrap;
assert_eq!;
assert_eq!;
Chunk metadata and item slicing
use ;
use *;
let data: =
.flat_map
.collect;
let chunk = compress
.unwrap;
let = cbuffer_sizes.unwrap;
assert_eq!;
assert_eq!;
assert!;
let items_10_to_19 = getitem.unwrap;
assert_eq!;
Multi-chunk container (Schunk)
use ;
use *;
use Schunk;
let cparams = CParams ;
let mut schunk = new;
// Append data in chunks
let data: =
.flat_map
.collect;
for chunk_start in .step_by
// Save to file
schunk.to_file.unwrap;
// Read back
let schunk2 = open.unwrap;
let restored = schunk2.decompress_chunk.unwrap;
let mut restored_into = vec!;
let written = schunk2.decompress_chunk_into.unwrap;
assert_eq!;
let compressed = schunk2.compressed_chunk.unwrap;
let view = schunk2.compressed_chunk_view.unwrap;
assert_eq!;
// Or keep chunks on disk and read only what is needed
let lazy = open_lazy.unwrap;
let tail = lazy.get_slice.unwrap;
In-memory frames and slices
use ;
use *;
use Schunk;
let mut schunk = new;
schunk.append_buffer.unwrap;
let frame = schunk.to_frame;
let mut from_memory = from_frame.unwrap;
let first_bytes = from_memory.get_slice.unwrap;
from_memory.set_slice.unwrap;
let all_data = from_memory.decompress_all.unwrap;
Blosc1-style wrappers
use ;
use *;
let data: =
.flat_map
.collect;
let mut compressed = vec!;
let csize = blosc1_compress.unwrap;
let mut restored = vec!;
let dsize = blosc1_decompress.unwrap;
assert_eq!;
assert_eq!;
Benchmarks
These are local measurements from April 27, 2026, not universal truths. They come from the
checked-in comparison examples and compare against
blosc2-rs, which wraps the original C-Blosc2 library.
The workload is the examples' default 10 MiB float32 signal-with-noise buffer at clevel=5
and typesize=4. Ratios are pure Rust speed divided by C-Blosc2 wrapper speed.
Full Comparison
| Case | Threads | Size pure/C | Compress pure/C (MB/s) | Compress ratio | Decompress pure/C (MB/s) | Decompress ratio |
|---|---|---|---|---|---|---|
| BloscLZ, no filter | 1 | 10485792 / 10486432 | 4353.8 / 914.9 | 4.76x | 10402.2 / 10879.7 | 0.96x |
| BloscLZ, shuffle | 1 | 8033115 / 8033115 | 900.0 / 602.1 | 1.49x | 4679.7 / 4596.5 | 1.02x |
| LZ4, shuffle | 1 | 7823630 / 7823630 | 651.3 / 509.8 | 1.28x | 2537.0 / 2546.5 | 1.00x |
| Zstd, shuffle | 1 | 7259575 / 7259575 | 80.9 / 92.2 | 0.88x | 1602.8 / 1574.3 | 1.02x |
| BloscLZ, no filter | 4 | 10485792 / 10486432 | 4452.3 / 2163.9 | 2.06x | 20600.3 / 28889.3 | 0.71x |
| BloscLZ, shuffle | 4 | 8033115 / 8033115 | 2209.0 / 2067.4 | 1.07x | 18190.4 / 16003.2 | 1.14x |
| LZ4, shuffle | 4 | 7823630 / 7823630 | 1656.0 / 1550.5 | 1.07x | 9084.4 / 4703.9 | 1.93x |
| Zstd, shuffle | 4 | 7259575 / 7259575 | 269.4 / 323.3 | 0.83x | 5914.2 / 6187.0 | 0.96x |
Focused BloscLZ Comparison
| Case | Size pure/C | Compress pure/C (MB/s) | Compress ratio | Decompress pure/C (MB/s) | Decompress ratio |
|---|---|---|---|---|---|
| BloscLZ, no filter | 10485792 / 10486432 | 4385.5 / 2424.2 | 1.81x | 10371.0 / 10247.0 | 1.01x |
| BloscLZ, shuffle | 8033115 / 8024160 | 975.6 / 845.3 | 1.15x | 4923.4 / 5082.2 | 0.97x |
| Unshuffle4 dispatch/scalar | n/a | n/a | n/a | 11131.3 / 11000.6 | 1.01x |
Current reading:
- Pure Rust is faster than C-Blosc2 on most BloscLZ and LZ4 compression rows in this run.
- Four-thread decompression is mixed: pure Rust is faster for BloscLZ shuffle and LZ4 shuffle, while C-Blosc2 is faster for no-filter BloscLZ and slightly faster for Zstd.
- Zstd compression remains the main weakness; profiling shows most time inside
zstd-pure-rs's core lazy compressor rather than this crate's block orchestration. - The no-filter BloscLZ row uses a whole-chunk
memcpyedfast path when sampled blocks would otherwise be stored verbatim. This improves speed and size but means the compressed bytes are not byte-identical to C-Blosc2. - All rows decode to identical bytes.
- For serious tuning, rerun individual cases with
BLOSC2_COMPARE_ITERS=...,BLOSC2_COMPARE_CASE=..., andBLOSC2_COMPARE_THREADS=....
Codec Comparison
| Codec | Speed | Compression | Best for |
|---|---|---|---|
| BloscLZ | Fast | Moderate | General purpose |
| LZ4 | Fastest | Moderate | Speed-critical |
| LZ4HC | Slow | Good | High-compression LZ4 variant (pure Rust) |
| Zlib | Slow | Good | Compatibility with zlib/deflate users |
| Zstd | Moderate | Best | Storage-critical |
Building
# Use the miniz_oxide-backed fallback instead
For benchmarks, compile with native CPU optimizations:
RUSTFLAGS="-C target-cpu=native"
To reproduce the direct crates.io comparison against blosc2-rs:
The default zlib backend is flate2 with the zlib-rs backend. That keeps the default build
Rust-first and avoids adding native zlib or zlib-ng requirements. If you need the older
fallback for comparison or troubleshooting, build with
--no-default-features --features zlib-miniz instead. When zlib/deflate compatibility is not
required, prefer LZ4 for speed or Zstd for stronger compression.
Testing
The full test suite cross-checks against C-Blosc2 via FFI and requires the c-blosc2 source directory, cmake, and libclang:
License
BSD 3-Clause (same as the original C-Blosc2 license)