hmm # lbzip2-rs
π§ββ Med Allfaderns visdom, kompression och korruptionsskydd. β‘ Med hans blick ΓΆver varje bit.
Pure Rust parallel bzip2 decompressor. No C dependencies. Usable as a library (in-process, zero-copy) or as a CLI tool:
# Decompress any bzip2 file (including pbzip2 concatenated streams)
What makes this crate unique: 100 % Rust (no C/FFI), in-process, zero-copy, and parallel block-boundary scanning β splitting a chunk across N cores is O(N), not O(n) where n is the raw byte count (e.g. 200 MB per chunk). Each core only scans ~500 bytes forward from its split point, so with 16 cores the total scan is ~8 KB for a 200 MB chunk. 4Γ oversplit lets rayon work-steal across 64 segments, eliminating core idle time from uneven block sizes.
Part of the znippy group of software, designed for fast zero-copy integration with osm-katana β the parallel OSM-to-GeoParquet pipeline.
Performance
Library (in-process, liechtenstein.osm.bz2 β 5.2 MB β 60 MB, 71 blocks)
| Mode | Throughput | vs C libbz2 |
|---|---|---|
| C libbz2 (single-thread) | 107 MB/s | 1.0Γ |
| lbzip2-rs single-thread | 143 MB/s | 1.3Γ |
| lbzip2-rs parallel (12 threads) | 713 MB/s | 6.6Γ |
CLI (lbunzip2 vs C lbzip2)
| Test file | C lbzip2 | lbzip2-rs | |
|---|---|---|---|
| Planet 1 GB slice (β 9.86 GB) | 30.5 s (323 MB/s) | 30.3 s (325 MB/s) | 0.6% faster |
| Liechtenstein 3 MB (β 60 MB) | 0.15 s | 0.22 s | startup overhead |
8-core / 16-thread, NVMe, /dev/null output, 3-run average.
End-to-end: Planet bz2 β PBF (osm-katana)
| Input | 147 GB planet-241021.osm.bz2 |
| Output | 68 GB planet-241021.osm.pbf |
| Time | 81 minutes |
| Elements | 10.5 billion |
| Throughput | 309 MB/s decompressed XML |
Full pipeline: bz2 decompress β VTD XML parse β PBF encode, 15 workers.
Usage
use ChunkDecoder;
let data: & = /* compressed chunk including BZhN header */;
let decoder = from_header?;
// Returns segments separately β no giant memcpy
let = decoder.decode_chunk_segments?;
for seg in &segments
Single-stream sequential API:
let output = decompress?;
License
MIT OR Apache-2.0, plus the original bzip2 license (BSD-style, Julian Seward) for the block-decode routines inspired by the C reference implementation. See LICENSE-BZIP2 for the full text.