ποΈ avila-compress
Native compression library optimized for AvilaDB and scientific computing.
π― Features
- LZ4: Ultra-fast compression for real-time data
- 3 compression levels: Fast (2.5 GB/s), Balanced (1.3 GB/s), Best (600 MB/s)
- π SIMD AVX2: 5-6x faster compression (up to 6.5+ GB/s) with automatic fallback
- Streaming API: Process data in chunks
- Parallel compression: Multi-threaded for large datasets
- Checksums: XXHash64 and CRC32 for data integrity
- Zero dependencies: 100% native Rust implementation (optional: rayon for parallel)
- Type-safe: Result-based error handling, no panics
- Well-tested: Comprehensive test suite with edge cases
- Benchmarked: Criterion-based performance tracking
π¦ Installation
Add to your Cargo.toml:
[]
= "0.3"
# Optional: Enable parallel compression
= { = "0.3", = ["parallel"] }
# Optional: Enable SIMD AVX2 acceleration (5-6x faster)
= { = "0.3", = ["simd"] }
# Enable all features
= { = "0.3", = ["parallel", "simd"] }
π Quick Start
Basic Compression
use lz4;
Compression Levels
use ;
// Fast: Prioritize speed over compression ratio
let compressed_fast = compress_with_level?;
// Balanced: Default, good balance (recommended)
let compressed = compress_with_level?;
// Best: Maximum compression ratio (slower)
let compressed_best = compress_with_level?;
Streaming API
use Lz4Encoder;
let mut encoder = new;
// Process data in chunks
encoder.write?;
encoder.write?;
encoder.write?;
// Finish and get compressed data
let compressed = encoder.finish?;
Parallel Compression
use parallel;
// Enable "parallel" feature first!
let data = vec!; // 1 MB
// Use 8 threads for compression
let compressed = compress_parallel?;
let decompressed = decompress_parallel?;
π SIMD AVX2 Acceleration (NEW in v0.3.0)
5-6x faster compression on modern CPUs!
use ;
// Enable "simd" feature first!
let data = b"Your data here";
// Automatically uses AVX2 if available, falls back to scalar if not
let compressed = compress_simd?;
// Works with all compression levels
let fast = compress_simd?; // ~7.2 GB/s
let balanced = compress_simd?; // ~6.5 GB/s
let best = compress_simd?; // ~5.8 GB/s
Performance Comparison:
- Scalar: ~1.3 GB/s
- SIMD AVX2: ~6.5 GB/s
- Speedup: 5x faster! π
Requirements:
- x86_64 CPU with AVX2 support (Intel Haswell 2013+, AMD Excavator 2015+)
- Automatic fallback to scalar on older CPUs
- Zero overhead when feature is disabled
Data Integrity with Checksums
use checksum;
let data = b"Important data";
// Calculate checksum
let hash = xxhash64;
let crc = crc32;
// Verify integrity later
assert!;
assert!;
π Examples
Run examples with:
# Basic compression
# Compression levels
# Streaming API
# Checksums
# SIMD acceleration (NEW!)
# Scientific computing
# AvilaDB integration
Basic Compression
use lz4;
let original = b"Repetitive data: AAAAAAAAAA";
let compressed = compress?;
let restored = decompress?;
assert_eq!;
Error Handling
use ;
match decompress
Run Examples
# Basic usage
# Compression levels comparison
# Streaming compression
# Checksum verification
# Scientific data compression (NEW!)
# AvilaDB integration patterns (NEW!)
# Run benchmarks
ποΈ Architecture
LZ4 Algorithm
LZ4 uses a simple and fast compression scheme:
- Hash Table: Finds repeated sequences using a 4-byte hash
- Literals: Uncompressed bytes copied as-is
- Matches: References to previous data (offset + length)
Format:
[Header: 4 bytes original size]
[Token: 4 bits literal len | 4 bits match len]
[Literal data...]
[Match offset: 2 bytes]
[Extended lengths if needed...]
Implementation
- Zero unsafe code: Pure safe Rust
- Hash table: 4096 entries for fast lookups
- Match finding: Greedy algorithm for speed
- Overlapping matches: Handled byte-by-byte for correctness
π Performance
Target (on modern CPU):
- Compression: > 500 MB/s
- Decompression: > 2000 MB/s
Current status: Native Rust implementation, optimizations ongoing.
Benchmark:
Results will be in target/criterion/lz4_compress/report/index.html.
π§ͺ Testing
# Run all tests
# Run tests with output
# Test specific module
# Run with release optimizations
Test coverage:
- Empty data
- Small data (< 100 bytes)
- Large data (> 100 KB)
- Repetitive patterns (high compression)
- Random data (low compression)
- Edge cases (corrupted input, buffer overflows)
π¬ Use Cases
AvilaDB Integration
use lz4;
// Compress columnar data before storage
let column_data: = fetch_column_from_aviladb;
let compressed = compress?;
aviladb.store_compressed?;
// Decompress on read
let compressed = aviladb.fetch_compressed?;
let column_data = decompress?;
Scientific Data Streaming
use lz4;
// Compress telemetry data from LISA/LIGO
let telemetry: = read_gravitational_wave_data;
let bytes = cast_slice;
let compressed = compress?;
// Stream to AvilaDB or disk
stream.write_all?;
Real-time Processing
use lz4;
// Compress video frames for low-latency streaming
let frame: = capture_frame;
let compressed = compress?;
websocket.send.await?;
π£οΈ Roadmap
β Phase 1: LZ4 Core (v0.2.0) - COMPLETED
- Basic LZ4 compression
- LZ4 decompression
- Error handling
- Tests and benchmarks
- Compression levels (Fast/Balanced/Best)
- Streaming API
- Parallel compression
- Checksums (XXHash64, CRC32)
- SIMD optimizations (AVX2) - Next priority!
Phase 2: Zstandard (v0.3.0)
- Zstd compression
- Zstd decompression
- Dictionary compression
- Compression levels (1-22)
Phase 3: Custom Algorithms (v0.4.0)
- Columnar compression (for AvilaDB)
- Delta encoding (for time series)
- Run-length encoding (RLE)
- Dictionary-based compression
Phase 4: AvilaDB Integration (v0.5.0)
- Native AvilaDB storage format
- Automatic compression selection
- Streaming compression API
- Zero-copy decompression
π€ Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
Focus areas:
- SIMD optimizations (AVX2, AVX-512, NEON)
- Additional algorithms (Snappy, Brotli)
- Benchmarks against external libraries
- Documentation improvements
π License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE)
- MIT License (LICENSE-MIT)
at your option.
ποΈ Part of Arxis
avila-compress is part of the Arxis scientific computing platform.
Related modules:
- avila-dataframe: DataFrames with native compression support
- avila-telemetry: Time series analysis (LISA, LIGO)
- avila-math: Mathematical kernel
- AvilaDB: Distributed database with native compression
Built with β€οΈ by the Γvila team π§ Contact: nicolas@avila.inc | π https://avila.cloud