gdelta
A fast delta compression algorithm for similar data chunks, implemented in pure Rust.
Overview
gdelta is a Rust implementation of the GDelta algorithm by Haoliang Tan. It provides efficient delta encoding for
similar data chunks (typically 4KB - 64KB) commonly found in deduplication systems.
Key Features:
- Fast delta encoding and decoding with optional SIMD optimization
- Memory-safe implementation in pure Rust
- Simple, ergonomic API with full-featured CLI tool
- No unsafe code
- Thoroughly tested with comprehensive benchmarking suite
Performance:
- Encoding: 370-1,080 MiB/s depending on data characteristics
- Small chunks (16KB): up to 1.08 GiB/s
- Large data (256KB+): 370-400 MiB/s sustained
- Decoding: 2.0-10.6 GiB/s (5-28x faster than encoding)
- Average: 4.1 GiB/s across diverse workloads
- Compression: 63% space saved on average (raw delta)
- With zstd: 70% space saved, 258 MiB/s
- With lz4: 66% space saved, 350 MiB/s
- Fastest in class: Among open-source delta algorithms, gdelta offers the best speed-to-compression balance
Benchmarked on AMD Ryzen 7 7800X3D with 16 cores (Fedora Linux 42). See PERFORMANCE.md for detailed analysis.
Installation
As a Library
Add this to your Cargo.toml:
[]
= "0.2"
As a CLI Tool
Install using cargo:
Or build from source:
# Binary at: target/release/gdelta
Usage
Library API
use ;
// Create base data and modified version
let base_data = b"Hello, World! This is some base data.";
let new_data = b"Hello, Rust! This is some modified data.";
// Encode the delta
let delta = encode?;
println!;
// Decode to recover the new data
let recovered = decode?;
assert_eq!;
CLI Tool
Use
gdelta helpto see the most up-to-date options and descriptions.
The CLI provides a simple interface for creating and applying delta patches:
Create a delta patch:
# Basic usage
# With compression (recommended)
# With verification
Apply a delta patch:
# Auto-detects compression format
# Force specific format (if magic bytes conflict)
Options:
-c, --compress <FORMAT>- Compression: none, zstd, lz4 (default: none)-v, --verify- Verify delta after creation (encode only)-y, --yes- Skip memory warning prompts-f, --force- Overwrite existing files-q, --quiet- Suppress output except errors
Example workflow:
# Create compressed delta
# Later, apply the update
# Verify the result
Memory Management:
The CLI monitors memory usage and warns when operations might use >80% of available RAM. This is important because gdelta loads entire files into memory.
# For large files, the tool will prompt:
)
Use -y to skip prompts in automated scripts.
How It Works
GDelta uses:
- GEAR Rolling Hash - Fast fingerprinting for chunk boundaries
- Variable-Length Integer Encoding - Efficient space utilization
- Copy/Literal Instructions - Minimal delta representation
- Prefix/Suffix Matching - Optimized for common data patterns
The algorithm identifies matching regions between base and new data, then encodes only the differences as a series of copy and literal instructions.
Algorithm Parameters
The implementation uses optimized default parameters:
- Chunk size: 300 KB
- Word size: 8 bytes
- Base sample rate: 3
- Features: Skip optimization, reverse matching
These parameters are tuned for typical deduplication workloads.
Comparison with Other Algorithms
Performance Comparison (from comprehensive benchmarks):
| Algorithm | Speed | Compression | Memory | Use Case |
|---|---|---|---|---|
| gdelta | 397 MB/s | 63% | Low | Best all-around speed |
| gdelta+zstd | 258 MB/s | 70% | Low | Balanced speed/compression |
| gdelta+lz4 | 350 MB/s | 66% | Low | Fast with compression |
| xpatch | 291 MB/s | 75% | Low | Automatic algorithm selection |
| qbsdiff | 22 MB/s | 84% | Medium | Maximum compression |
| xdelta3 | 45 MB/s | 81% | Medium | Failed verification ⚠️ |
See PERFORMANCE.md for detailed benchmarks and methodology.
Key Takeaways:
- Fastest: gdelta and gdelta+lz4 for high-speed applications
- Best Compression: qbsdiff if speed is not critical
- Best Balance: gdelta+zstd for most production use cases
- Decoding: gdelta is 2-5x faster at decoding than alternatives
Use Cases
- Deduplication Systems - Compress similar chunks
- Backup Software - Incremental backups
- File Synchronization - Minimize transfer size
- Version Control - Efficient diff storage
- Binary Patching - Software updates and distributions
- Cloud Storage - Reduce storage costs
- Database Replication - Minimize bandwidth
Development
Running Tests
# Run unit tests
# Run integration tests
# Run CLI test suite
# Run simple benchmarks (quick verification)
# Run comprehensive benchmarks (15-30 min)
# Run comprehensive benchmarks with custom filters
BENCH_FORMATS=json,csv BENCH_ALGOS=gdelta,xpatch
Benchmark Modes
The comprehensive benchmark supports two modes:
# Quick mode (default): smaller sample size, faster
BENCH_MODE=quick
# Full mode: larger sample size, more accurate
BENCH_MODE=full
Results are saved to target/benchmark_report_<timestamp>.md and .json.
Credits
This is a Rust implementation of the GDelta algorithm by Haoliang Tan.
Original repositories/resources:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Built with ❤️ by ImGajeed76