walrust
Lightweight SQLite replication to S3/Tigris in Rust.
Walrust continuously replicates SQLite databases to S3-compatible storage, ensuring minimal data loss on server crashes, power failures, or disk corruption. Like Litestream but with an emphasis on memory footprint and ease of configuration.
v0.4.0: Module split, periodic validation, cache cleanup, 346 tests passing.
Installation
CLI (Rust)
Python Package
Then use from Python:
# Create instance
=
# Snapshot a database
# List backed up databases
=
# Restore a database
Quick Start
# Watch databases and sync to S3
# With Tigris endpoint
# With auto-compaction after each snapshot
# Take immediate snapshot
# List backed up databases
# Restore database
# Clean up old snapshots (dry-run)
# Actually delete old snapshots
Acknowledgments
Walrust wouldn't exist without Litestream and the work of Ben Johnson. Litestream was the first place I saw WAL-based SQLite replication to cloud storage, and walrust uses the same LTX file format for efficient compaction and replication.
How It Works
Local: S3 (LTX format):
app.db /app/00000001-00000001.ltx (snapshot)
app.db-wal ────────────────► /app/00000002-00000010.ltx (incremental)
(polling) /app/manifest.json
- Watch - Poll WAL files for changes at configurable interval
- Sync - Upload new WAL frames as LTX files to S3
- Snapshot - Periodic full database snapshots (configurable interval)
- Restore - Download snapshot + apply incremental LTX files
Commands
walrust watch
Watch databases and continuously sync WAL changes.
)
)
)
# Checkpointing (prevent unbounded WAL growth)
)
)
)
# Validation
)
# Compaction
)
# Retention
)
)
)
)
walrust snapshot
Take an immediate snapshot.
walrust restore
Restore a database from S3.
walrust compact
Clean up old snapshots using retention policy (GFS rotation).
)
)
)
)
)
Example:
# Preview what would be deleted
# Actually delete old snapshots
# Keep more hourly snapshots
walrust list
List backed up databases.
walrust explain
Preview configuration before running watch mode - see what walrust will do without starting it.
Displays:
- Database list and S3 destination
- Snapshot schedule and retention policy (GFS)
- Validation intervals
- Webhook notifications
- Estimated monthly storage costs (Tigris: $0.02/GB, S3: $0.023/GB)
Example output:
Configuration Summary
=====================
Databases:
- /path/to/app.db → s3://my-bucket/backups/app
Retention Policy (GFS):
Hourly: 24 snapshots (last 24 hours)
Daily: 7 snapshots (last week)
Total: 55 snapshots per database
Estimated Storage Costs:
Per 1GB database: ~$1.10/month (Tigris)
Per 10GB database: ~$11/month
walrust verify
Verify backup integrity without doing a full restore - fast integrity checking.
Checks:
- Snapshot existence (critical - prevents incomplete backups)
- File existence (all manifest entries have S3 objects)
- Header validity (LTX headers parse correctly)
- Checksums (SHA256 verification)
- TXID continuity (no gaps in transaction sequence)
Exit codes:
0= All checks passed1= Issues found (warnings)2= Critical errors (no snapshot, major gaps)
Example output:
Verifying backup: mydb in s3://my-bucket/backups...
Snapshot: Found generation 1 (TXID 1-1, 4096 bytes)
Incremental files: 15 files
OK 0000000000000002-0000000000000005.ltx (4 TXIDs, 12KB)
OK 0000000000000006-0000000000000010.ltx (5 TXIDs, 16KB)
Verified: 17/17 files (28.0 KB total)
Continuity: No gaps detected (TXID 1-100)
All checks passed - backup integrity verified
Exit code: 0 (success)
walrust replicate
Create a read replica that polls S3 for updates.
)
walrust pragma
Output SQLite PRAGMA settings for optimal walrust compatibility.
)
Exit Codes
Walrust uses structured exit codes for scripting and automation:
| Code | Name | Description |
|---|---|---|
| 0 | Success | Operation completed successfully |
| 1 | General | Unknown or uncategorized error |
| 2 | Config | Configuration error (invalid config file, missing CLI args) |
| 3 | Database | Database error (file not found, WAL corruption, SQLite issues) |
| 4 | S3 | S3 error (network, authentication, bucket access) |
| 5 | Integrity | Integrity error (checksum mismatch, LTX verification failed) |
| 6 | Restore | Restore error (no snapshot found, PITR unavailable) |
Example usage in scripts:
Environment Variables
AWS_ACCESS_KEY_ID- AWS/Tigris access keyAWS_SECRET_ACCESS_KEY- AWS/Tigris secret keyAWS_ENDPOINT_URL_S3- S3 endpoint (for Tigris/MinIO)AWS_REGION- AWS region (default: us-east-1)
Configuration File
Create walrust.toml in your project directory:
[]
= "s3://my-bucket/backups"
= "https://fly.storage.tigris.dev"
[]
= 3600 # Snapshot every hour
= 1 # Batch WAL syncs every 1 second
= 60 # Checkpoint every 60 seconds
= 1000 # Only checkpoint if WAL >= 1000 pages (~4MB)
= 121359 # Emergency truncate at 500MB
= 86400 # Backup validation every 24 hours (0 = disabled)
= 1000 # Snapshot after 1000 WAL frames
= 300 # Snapshot after 5 min of changes
= 60 # Snapshot after 60 sec of no activity
= true
= 3600
[]
= 24
= 7
= 12
= 12
# Retry configuration for transient S3 failures
[]
= 5 # Number of retry attempts
= 100 # Initial backoff delay
= 30000 # Maximum backoff cap (30s)
= true # Enable circuit breaker
= 10 # Failures before circuit opens
= 60000 # Cooldown before half-open (1 min)
# Webhook notifications for failure events
[[]]
= "https://example.com/walrust-webhook"
= ["sync_failed", "auth_failure", "corruption_detected", "circuit_breaker_open"]
= "optional-hmac-secret" # For X-Walrust-Signature header
[[]]
= "/data/app.db"
= "production"
[[]]
= "/data/analytics.db"
= 30 # Override: checkpoint more frequently
= 50000 # Override: lower emergency threshold
= 3600 # Override: validate hourly for this DB
Then run:
# or
S3 Layout (LTX Format)
s3://bucket/prefix/
├── dbname/
│ ├── 00000001-00000001.ltx # Snapshot (TXID 1)
│ ├── 00000002-00000010.ltx # Incremental (TXID 2-10)
│ ├── 00000011-00000050.ltx # Incremental (TXID 11-50)
│ └── manifest.json # Index of LTX files
└── otherdb/
└── ...
Data Integrity
SHA256 Verification
Every snapshot includes an SHA256 checksum stored in S3 object metadata (x-amz-meta-sha256). During restore, checksums are automatically verified:
✓ Checksum stored during snapshot
✓ Verified automatically on restore
✓ Fail-fast on corruption detection
✓ Works with existing backups (optional)
Snapshot Compaction
Walrust uses Grandfather/Father/Son (GFS) rotation to manage snapshot retention:
| Tier | Default | Description |
|---|---|---|
| Hourly | 24 | Snapshots from last 24 hours |
| Daily | 7 | One per day for last week |
| Weekly | 12 | One per week for last 12 weeks |
| Monthly | 12 | One per month beyond 12 weeks |
Safety guarantees:
- Always keeps latest snapshot
- Minimum 2 snapshots retained
- Dry-run by default (--force required to delete)
Auto-compaction modes:
# After each snapshot
# On interval (every hour)
Multi-Database Scalability
| Databases | Litestream | Walrust | Reduction |
|---|---|---|---|
| 1 | 36 MB | 19 MB | 47% |
| 10 | 55 MB | 19 MB | 65% |
| 100 | 160 MB | 20 MB | 88% |
Measured with 100KB databases on macOS, syncing to Tigris S3. See bench/BENCHMARK_FRAMEWORK.md for methodology.
Walrust's memory usage remains ~19-20 MB regardless of database count.
Testing
Test suite includes:
- ✅ Byte-for-byte data integrity (snapshot → restore → verify)
- ✅ SHA256 checksum storage and verification
- ✅ Multi-database concurrent snapshots
- ✅ WAL file format parsing
- ✅ S3 operations
- ✅ Retry logic with exponential backoff
- ✅ Chaos testing with fault injection (walrust-dst)
- ✅ Property-based testing (7 properties, 100+ cases each)
- ✅ Core invariants (transaction recovery, WAL batching, snapshot atomicity)
- ✅ Continuous chaos testing with MTBF tracking
Run tests: make test (injects S3/Tigris credentials via Soup)
Benchmarking
Walrust includes a comprehensive benchmark framework for measuring data loss prevention and replication lag vs litestream.
Quick start:
# Simple 2-database test
# Scalability matrix: 4 DB counts × 3 write rates = 24 runs
What it measures:
- Data loss detection (all committed writes in S3?)
- Replication lag (P50/P95/P99 sync latency)
- Resource usage (CPU/memory under load)
- Throughput (writes/sec achieved)
Architecture:
- DatabaseWriter threads write to SQLite at controlled rates
- walrust/litestream sync to S3 in background
- Restore from S3 and compare expected vs actual writes
- Report data loss and sync latency percentiles
See bench/BENCHMARK_FRAMEWORK.md for full documentation.
Use with Tenement/Slum
Back up tenant SQLite databases with a single walrust process:
# In your tenement deployment
Memory usage remains low when watching many databases (see Multi-Database Scalability above).
Documentation
- Docs Site - Full documentation
- ROADMAP.md - Planned features and direction
- bench/BENCHMARK_FRAMEWORK.md - Benchmark methodology and results
License
Apache 2.0