gdelta
A fast delta compression algorithm for similar data chunks, implemented in pure Rust.
Overview
gdelta is a Rust implementation of the GDelta algorithm by Haoliang Tan. It provides efficient delta encoding for similar data chunks (typically 4KB - 64KB) commonly found in deduplication systems.
Key Features:
- 🚀 Fast delta encoding and decoding (1 GB/s encode, 5-8 GB/s decode)
- 🔒 Memory-safe implementation in pure Rust
- 📦 Simple, ergonomic API
- ✨ No unsafe code
- 🧪 Thoroughly tested
Performance:
- Encoding: 850-1000 MiB/s (~1 GB/s)
- Decoding: 5-8 GB/s (5-8x faster than encoding)
- Faster than Xdelta, Zdelta, Ddelta, and Edelta
- Optimized for inter-chunk redundancy removal
- Best used with general compression (e.g., ZSTD) for additional compression
Benchmarked on typical hardware with similar data chunks
Installation
Add this to your Cargo.toml:
[]
= "0.1"
Usage
use ;
// Create base data and modified version
let base_data = b"Hello, World! This is some base data.";
let new_data = b"Hello, Rust! This is some modified data.";
// Encode the delta
let delta = encode?;
println!;
// Decode to recover the new data
let recovered = decode?;
assert_eq!;
How It Works
GDelta uses:
- GEAR Rolling Hash - Fast fingerprinting for chunk boundaries
- Variable-Length Integer Encoding - Efficient space utilization
- Copy/Literal Instructions - Minimal delta representation
- Prefix/Suffix Matching - Optimized for common data patterns
The algorithm identifies matching regions between base and new data, then encodes only the differences as a series of copy and literal instructions.
Algorithm Parameters
The implementation uses optimized default parameters:
- Chunk size: 300 KB
- Word size: 8 bytes
- Base sample rate: 3
- Features: Skip optimization, reverse matching
These parameters are tuned for typical deduplication workloads.
Comparison with Other Delta Algorithms
| Feature | GDelta | Xdelta | Zdelta |
|---|---|---|---|
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡ |
| Memory | Low | Medium | High |
| Compression | Good* | Excellent | Good |
*Use with ZSTD/LZ4 for best compression ratio
Use Cases
- Deduplication Systems - Compress similar chunks
- Backup Software - Incremental backups
- File Synchronization - Minimize transfer size
- Version Control - Efficient diff storage
Credits
This is a Rust implementation of the GDelta algorithm by Haoliang Tan.
Original repositories/resources:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.