packsimd
Note: This release includes the Scalar and SSE4.1 backends only. AVX2 and AVX-512 implementations are planned for a future release. On x86_64 CPUs with SSE4.1, the SSE4.1 backend is used automatically.
High-performance BP128 compression for u32 integer arrays with SIMD acceleration, zero-allocation APIs, and deterministic encoding.
Overview
packsimd compresses integer arrays by packing each block of 128 values using the minimum bit width required. It automatically detects and uses the best available SIMD backend at runtime.
| Use Case | Example | Typical Ratio |
|---|---|---|
| Database Indexing | Posting lists, doc IDs | 20-40% |
| Search Systems | Inverted indices | 20-40% |
| Time Series | Timestamp deltas | 30-50% |
| Network Protocols | Integer data transfer | Varies |
| Columnar Storage | Integer columns | 20-40% |
Features
- BP128 Algorithm — Variable bit-width packing, 128 values per block
- SIMD Acceleration — SSE4.1 on x86_64 with automatic runtime detection
- Scalar Fallback — Reference implementation for non-SIMD targets
- Zero-Allocation API —
compress_into/decompress_intowith pre-allocated buffers - Fast Header Inspection —
decompressed_lenreads size without decompressing - Deterministic Output — Same input always produces identical compressed bytes
- No Dependencies — Zero runtime dependencies
- No Panics — All error conditions return
Result - Extensively Tested — Property-based testing (proptest), fuzz targets, 128+ tests
Installation
[]
= "0.1"
Quick Start
use ;
Zero-Allocation Path
use ;
Documentation
For complete API reference and usage examples, see USAGE.md.
Architecture
┌─────────────────────────────────────────────────────┐
│ Public API │
│ compress / compress_into / max_compressed_size │
│ decompress / decompress_into / decompressed_len │
└──────────────────────┬──────────────────────────────┘
│
┌────────────┴────────────┐
│ Dispatch │
│ Runtime CPU detection │
│ OnceLock caching │
└────────────┬────────────┘
│
┌───────────────┼───────────────┐
│ │ │
┌────┴────┐ ┌─────┴─────┐ ┌────┴────┐
│ Scalar │ │ SSE4.1 │ │ AVX2 │
│Backend │ │ Backend │ │(planned)│
│ │ │ │ │ │
│Reference│ │ 128-bit │ │ 256-bit │
│ impl │ │ SIMD │ │ SIMD │
└─────────┘ └───────────┘ └─────────┘
| Component | Responsibility |
|---|---|
| compress | Bit width calculation, header writing, block packing |
| decompress | Header parsing, validation, block unpacking |
| bitwidth | required_bit_width, block size calculations |
| dispatch | Runtime SIMD backend selection and caching |
| simd/scalar | Reference scalar implementation (all bit widths) |
| simd/sse | SSE4.1-accelerated kernels (x86_64 only) |
Performance
Benchmarked on x86_64 (SSE4.1) with LTO and opt-level=3.
Compression Ratios
| Data Pattern | Ratio | Compress | Decompress |
|---|---|---|---|
| Sequential (0-999) | 23.65% | 5.4 GiB/s | 1.4 GiB/s |
| Constant (all same) | 18.97% | 2.8 GiB/s | 649 MiB/s |
| Random (full entropy) | 100.22% | 23.5 GiB/s | 24.0 GiB/s |
Throughput at Scale (1M values)
| Bit Width | Compress | Decompress |
|---|---|---|
| 1-bit | 13.7 GiB/s | 11.1 GiB/s |
| 8-bit | 8.9 GiB/s | 16.0 GiB/s |
| 16-bit | 7.1–7.5 GiB/s | 13.8–14.2 GiB/s |
| 32-bit | 7.9–8.3 GiB/s | 9.3–9.8 GiB/s |
SSE4.1 provides 1.7×–12.6× faster unpack across all bit widths. Scalar pack is competitive for most widths.
Run benchmarks:
Security
- No Panics — All error conditions return
Result - Input Validation — Header, bit widths, and buffer sizes verified before use
- OOM Protection — Maximum 1 billion decompressed values
- No Undefined Behavior — Unsafe blocks documented with invariants, covered by fuzz testing
Examples
See the examples/ directory:
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests:
cargo test --all-targets - Run clippy:
cargo clippy --all-targets -- -D warnings - Run benchmarks:
cargo bench - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Development Setup
# Run tests
# Run doc tests
# Generate documentation
Roadmap
| Feature | Status |
|---|---|
| Scalar implementation | Done |
| SSE4.1 backend | Done |
| AVX2 backend | Planned |
| AVX-512 backend | Planned |
License
MIT License - See LICENSE file for details.