# gdelta
[](https://crates.io/crates/gdelta)
[](https://docs.rs/gdelta)
[](https://opensource.org/licenses/MIT)
A fast delta compression algorithm for similar data chunks, implemented in pure Rust.
## Overview
`gdelta` is a Rust implementation of the GDelta algorithm by Haoliang Tan. It provides efficient delta encoding for similar data chunks (typically 4KB - 64KB) commonly found in deduplication systems.
**Key Features:**
- 🚀 Fast delta encoding and decoding (1 GB/s encode, 5-8 GB/s decode)
- 🔒 Memory-safe implementation in pure Rust
- 📦 Simple, ergonomic API
- ✨ No unsafe code
- 🧪 Thoroughly tested
**Performance:**
- **Encoding**: 850-1000 MiB/s (~1 GB/s)
- **Decoding**: 5-8 GB/s (5-8x faster than encoding)
- Faster than Xdelta, Zdelta, Ddelta, and Edelta
- Optimized for inter-chunk redundancy removal
- Best used with general compression (e.g., ZSTD) for additional compression
*Benchmarked on typical hardware with similar data chunks*
## Installation
Add this to your `Cargo.toml`:
```toml
[dependencies]
gdelta = "0.1"
```
## Usage
```rust
use gdelta::{encode, decode};
// Create base data and modified version
let base_data = b"Hello, World! This is some base data.";
let new_data = b"Hello, Rust! This is some modified data.";
// Encode the delta
let delta = encode(new_data, base_data)?;
println!("Delta size: {} bytes", delta.len());
// Decode to recover the new data
let recovered = decode(&delta, base_data)?;
assert_eq!(recovered, new_data);
```
## How It Works
GDelta uses:
1. **GEAR Rolling Hash** - Fast fingerprinting for chunk boundaries
2. **Variable-Length Integer Encoding** - Efficient space utilization
3. **Copy/Literal Instructions** - Minimal delta representation
4. **Prefix/Suffix Matching** - Optimized for common data patterns
The algorithm identifies matching regions between base and new data, then encodes only the differences as a series of copy and literal instructions.
## Algorithm Parameters
The implementation uses optimized default parameters:
- Chunk size: 300 KB
- Word size: 8 bytes
- Base sample rate: 3
- Features: Skip optimization, reverse matching
These parameters are tuned for typical deduplication workloads.
## Comparison with Other Delta Algorithms
| Speed | ⚡⚡⚡ | ⚡⚡ | ⚡ |
| Memory | Low | Medium | High |
| Compression | Good* | Excellent | Good |
*Use with ZSTD/LZ4 for best compression ratio
## Use Cases
- **Deduplication Systems** - Compress similar chunks
- **Backup Software** - Incremental backups
- **File Synchronization** - Minimize transfer size
- **Version Control** - Efficient diff storage
## Credits
This is a Rust implementation of the GDelta algorithm by **Haoliang Tan**.
Original repositories/resources:
- [GDelta Paper](https://ieeexplore.ieee.org/abstract/document/9229609/)
- [Original GDelta (C++)](https://github.com/apple-ouyang/gdelta)
- [GDelta with ZSTD (C++)](https://github.com/AnsonHooL/Gdelta)
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.