walrust 0.5.1

Lightweight SQLite WAL sync to S3/Tigris
Documentation

walrust

Lightweight SQLite replication to S3/Tigris in Rust.

Walrust continuously replicates SQLite databases to S3-compatible storage, ensuring minimal data loss on server crashes, power failures, or disk corruption. Like Litestream but with an emphasis on memory footprint and ease of configuration.

v0.4.0: Module split, periodic validation, cache cleanup, 346 tests passing.

Installation

CLI (Rust)

cargo install walrust

Python Package

pip install walrust

Then use from Python:

from walrust import Walrust

# Create instance
ws = Walrust("s3://my-bucket", endpoint="https://fly.storage.tigris.dev")

# Snapshot a database
ws.snapshot("/path/to/app.db")

# List backed up databases
dbs = ws.list()

# Restore a database
ws.restore("app", "/path/to/restored.db")

Quick Start

# Watch databases and sync to S3
walrust watch db1.db db2.db -b s3://my-bucket/backups

# With Tigris endpoint
walrust watch app.db -b s3://my-bucket --endpoint https://fly.storage.tigris.dev

# With auto-compaction after each snapshot
walrust watch app.db -b s3://my-bucket --compact-after-snapshot

# Take immediate snapshot
walrust snapshot app.db -b s3://my-bucket

# List backed up databases
walrust list -b s3://my-bucket

# Restore database
walrust restore mydb -o restored.db -b s3://my-bucket

# Clean up old snapshots (dry-run)
walrust compact mydb -b s3://my-bucket

# Actually delete old snapshots
walrust compact mydb -b s3://my-bucket --force

Acknowledgments

Walrust wouldn't exist without Litestream and the work of Ben Johnson. Litestream was the first place I saw WAL-based SQLite replication to cloud storage, and walrust uses the same LTX file format for efficient compaction and replication.

How It Works

Local:                          S3 (LTX format):
app.db                          /app/00000001-00000001.ltx  (snapshot)
app.db-wal  ────────────────►   /app/00000002-00000010.ltx  (incremental)
           (polling)            /app/manifest.json
  1. Watch - Poll WAL files for changes at configurable interval
  2. Sync - Upload new WAL frames as LTX files to S3
  3. Snapshot - Periodic full database snapshots (configurable interval)
  4. Restore - Download snapshot + apply incremental LTX files

Commands

walrust watch

Watch databases and continuously sync WAL changes.

walrust watch <DATABASES>... -b <BUCKET> [OPTIONS]

Options:
  --snapshot-interval <SECS>       Snapshot interval (default: 3600)
  --wal-sync-interval <SECS>       WAL sync batching interval (default: 1)
  --endpoint <URL>                 S3 endpoint (for Tigris/MinIO)

  # Checkpointing (prevent unbounded WAL growth)
  --checkpoint-interval <SECS>     Checkpoint interval (default: 60)
  --min-checkpoint-pages <N>       Min pages before checkpoint (default: 1000, ~4MB)
  --wal-truncate-threshold <N>     Emergency truncate threshold (default: 121359, ~500MB)

  # Validation
  --validation-interval <SECS>     Backup validation interval (default: 0, disabled)

  # Compaction
  --compact-after-snapshot         Run compaction after each snapshot
  --compact-interval <SECS>        Compaction interval in seconds (0 = disabled)

  # Retention
  --retain-hourly <N>              Hourly snapshots to keep (default: 24)
  --retain-daily <N>               Daily snapshots to keep (default: 7)
  --retain-weekly <N>              Weekly snapshots to keep (default: 12)
  --retain-monthly <N>             Monthly snapshots to keep (default: 12)

walrust snapshot

Take an immediate snapshot.

walrust snapshot <DATABASE> -b <BUCKET>

walrust restore

Restore a database from S3.

walrust restore <NAME> -o <OUTPUT> -b <BUCKET>

Options:
  --point-in-time <ISO8601>  Restore to specific time

walrust compact

Clean up old snapshots using retention policy (GFS rotation).

walrust compact <NAME> -b <BUCKET> [OPTIONS]

Options:
  --hourly <N>    Hourly snapshots to keep (default: 24)
  --daily <N>     Daily snapshots to keep (default: 7)
  --weekly <N>    Weekly snapshots to keep (default: 12)
  --monthly <N>   Monthly snapshots to keep (default: 12)
  --force         Actually delete files (default: dry-run only)

Example:

# Preview what would be deleted
walrust compact mydb -b s3://my-bucket

# Actually delete old snapshots
walrust compact mydb -b s3://my-bucket --force

# Keep more hourly snapshots
walrust compact mydb -b s3://my-bucket --hourly 48 --force

walrust list

List backed up databases.

walrust list -b <BUCKET>

walrust explain

Preview configuration before running watch mode - see what walrust will do without starting it.

walrust explain [--config <CONFIG>]

Displays:

  • Database list and S3 destination
  • Snapshot schedule and retention policy (GFS)
  • Validation intervals
  • Webhook notifications
  • Estimated monthly storage costs (Tigris: $0.02/GB, S3: $0.023/GB)

Example output:

Configuration Summary
=====================

Databases:
  - /path/to/app.db → s3://my-bucket/backups/app

Retention Policy (GFS):
  Hourly: 24 snapshots (last 24 hours)
  Daily: 7 snapshots (last week)
  Total: 55 snapshots per database

Estimated Storage Costs:
  Per 1GB database: ~$1.10/month (Tigris)
  Per 10GB database: ~$11/month

walrust verify

Verify backup integrity without doing a full restore - fast integrity checking.

walrust verify <NAME> -b <BUCKET> [OPTIONS]

Options:
  --endpoint <URL>  S3 endpoint

Checks:

  • Snapshot existence (critical - prevents incomplete backups)
  • File existence (all manifest entries have S3 objects)
  • Header validity (LTX headers parse correctly)
  • Checksums (SHA256 verification)
  • TXID continuity (no gaps in transaction sequence)

Exit codes:

  • 0 = All checks passed
  • 1 = Issues found (warnings)
  • 2 = Critical errors (no snapshot, major gaps)

Example output:

Verifying backup: mydb in s3://my-bucket/backups...

Snapshot: Found generation 1 (TXID 1-1, 4096 bytes)

Incremental files: 15 files
  OK 0000000000000002-0000000000000005.ltx (4 TXIDs, 12KB)
  OK 0000000000000006-0000000000000010.ltx (5 TXIDs, 16KB)

Verified: 17/17 files (28.0 KB total)
Continuity: No gaps detected (TXID 1-100)

All checks passed - backup integrity verified
Exit code: 0 (success)

walrust replicate

Create a read replica that polls S3 for updates.

walrust replicate <SOURCE> --local <PATH> [OPTIONS]

Options:
  --interval <DURATION>  Poll interval (default: 5s)
  --endpoint <URL>       S3 endpoint

walrust pragma

Output SQLite PRAGMA settings for optimal walrust compatibility.

walrust pragma [OPTIONS]

Options:
  -o, --output <FILE>    Write to SQL file
  --comments <bool>      Include explanatory comments (default: true)

Exit Codes

Walrust uses structured exit codes for scripting and automation:

Code Name Description
0 Success Operation completed successfully
1 General Unknown or uncategorized error
2 Config Configuration error (invalid config file, missing CLI args)
3 Database Database error (file not found, WAL corruption, SQLite issues)
4 S3 S3 error (network, authentication, bucket access)
5 Integrity Integrity error (checksum mismatch, LTX verification failed)
6 Restore Restore error (no snapshot found, PITR unavailable)

Example usage in scripts:

walrust verify mydb -b s3://bucket
case $? in
  0) echo "Verification passed" ;;
  5) echo "Integrity error - backup may be corrupted" ;;
  4) echo "S3 error - check credentials/connectivity" ;;
  *) echo "Other error: $?" ;;
esac

Environment Variables

  • AWS_ACCESS_KEY_ID - AWS/Tigris access key
  • AWS_SECRET_ACCESS_KEY - AWS/Tigris secret key
  • AWS_ENDPOINT_URL_S3 - S3 endpoint (for Tigris/MinIO)
  • AWS_REGION - AWS region (default: us-east-1)

Configuration File

Create walrust.toml in your project directory:

[s3]
bucket = "s3://my-bucket/backups"
endpoint = "https://fly.storage.tigris.dev"

[sync]
snapshot_interval = 3600        # Snapshot every hour
wal_sync_interval = 1           # Batch WAL syncs every 1 second
checkpoint_interval = 60        # Checkpoint every 60 seconds
min_checkpoint_page_count = 1000  # Only checkpoint if WAL >= 1000 pages (~4MB)
wal_truncate_threshold_pages = 121359  # Emergency truncate at 500MB
validation_interval = 86400     # Backup validation every 24 hours (0 = disabled)

max_changes = 1000              # Snapshot after 1000 WAL frames
max_interval = 300              # Snapshot after 5 min of changes
on_idle = 60                    # Snapshot after 60 sec of no activity

compact_after_snapshot = true
compact_interval = 3600

[retention]
hourly = 24
daily = 7
weekly = 12
monthly = 12

# Retry configuration for transient S3 failures
[retry]
max_retries = 5                 # Number of retry attempts
base_delay_ms = 100             # Initial backoff delay
max_delay_ms = 30000            # Maximum backoff cap (30s)
circuit_breaker_enabled = true  # Enable circuit breaker
circuit_breaker_threshold = 10  # Failures before circuit opens
circuit_breaker_cooldown_ms = 60000  # Cooldown before half-open (1 min)

# Webhook notifications for failure events
[[webhooks]]
url = "https://example.com/walrust-webhook"
events = ["sync_failed", "auth_failure", "corruption_detected", "circuit_breaker_open"]
secret = "optional-hmac-secret"  # For X-Walrust-Signature header

[[databases]]
path = "/data/app.db"
prefix = "production"

[[databases]]
path = "/data/analytics.db"
checkpoint_interval = 30        # Override: checkpoint more frequently
wal_truncate_threshold_pages = 50000  # Override: lower emergency threshold
validation_interval = 3600      # Override: validate hourly for this DB

Then run:

walrust watch  # Auto-discovers walrust.toml
# or
walrust watch --config custom.toml

S3 Layout (LTX Format)

s3://bucket/prefix/
├── dbname/
│   ├── 00000001-00000001.ltx     # Snapshot (TXID 1)
│   ├── 00000002-00000010.ltx     # Incremental (TXID 2-10)
│   ├── 00000011-00000050.ltx     # Incremental (TXID 11-50)
│   └── manifest.json             # Index of LTX files
└── otherdb/
    └── ...

Data Integrity

SHA256 Verification

Every snapshot includes an SHA256 checksum stored in S3 object metadata (x-amz-meta-sha256). During restore, checksums are automatically verified:

✓ Checksum stored during snapshot
✓ Verified automatically on restore
✓ Fail-fast on corruption detection
✓ Works with existing backups (optional)

Snapshot Compaction

Walrust uses Grandfather/Father/Son (GFS) rotation to manage snapshot retention:

Tier Default Description
Hourly 24 Snapshots from last 24 hours
Daily 7 One per day for last week
Weekly 12 One per week for last 12 weeks
Monthly 12 One per month beyond 12 weeks

Safety guarantees:

  • Always keeps latest snapshot
  • Minimum 2 snapshots retained
  • Dry-run by default (--force required to delete)

Auto-compaction modes:

# After each snapshot
walrust watch app.db -b s3://bucket --compact-after-snapshot

# On interval (every hour)
walrust watch app.db -b s3://bucket --compact-interval 3600

Multi-Database Scalability

Databases Litestream Walrust Reduction
1 36 MB 19 MB 47%
10 55 MB 19 MB 65%
100 160 MB 20 MB 88%

Measured with 100KB databases on macOS, syncing to Tigris S3. See bench/BENCHMARK_FRAMEWORK.md for methodology.

Walrust's memory usage remains ~19-20 MB regardless of database count.

Testing

Test suite includes:

  • ✅ Byte-for-byte data integrity (snapshot → restore → verify)
  • ✅ SHA256 checksum storage and verification
  • ✅ Multi-database concurrent snapshots
  • ✅ WAL file format parsing
  • ✅ S3 operations
  • ✅ Retry logic with exponential backoff
  • ✅ Chaos testing with fault injection (walrust-dst)
  • ✅ Property-based testing (7 properties, 100+ cases each)
  • ✅ Core invariants (transaction recovery, WAL batching, snapshot atomicity)
  • ✅ Continuous chaos testing with MTBF tracking

Run tests: make test (injects S3/Tigris credentials via Soup)

Benchmarking

Walrust includes a comprehensive benchmark framework for measuring data loss prevention and replication lag vs litestream.

Quick start:

# Simple 2-database test
uv run python bench/benchmark.py --config bench/configs/quick.yml

# Scalability matrix: 4 DB counts × 3 write rates = 24 runs
uv run python bench/benchmark.py --config bench/configs/scalability-matrix.yml

What it measures:

  • Data loss detection (all committed writes in S3?)
  • Replication lag (P50/P95/P99 sync latency)
  • Resource usage (CPU/memory under load)
  • Throughput (writes/sec achieved)

Architecture:

  • DatabaseWriter threads write to SQLite at controlled rates
  • walrust/litestream sync to S3 in background
  • Restore from S3 and compare expected vs actual writes
  • Report data loss and sync latency percentiles

See bench/BENCHMARK_FRAMEWORK.md for full documentation.

Use with Tenement/Slum

Back up tenant SQLite databases with a single walrust process:

# In your tenement deployment
walrust watch \
  /var/lib/ourfam/romneys/app.db \
  /var/lib/ourfam/smiths/app.db \
  /var/lib/ourfam/jones/app.db \
  -b s3://backups/ourfam \
  --endpoint https://fly.storage.tigris.dev

Memory usage remains low when watching many databases (see Multi-Database Scalability above).

Documentation

License

Apache 2.0