# Snapshot Guarantees
This guide covers what d-engine guarantees about snapshot behavior, known limitations,
and configuration recommendations for production deployments.
## Guarantees
**Write operations are never blocked by snapshot generation.**
Snapshot creation runs in a background `tokio::spawn` task, isolating compression I/O
from the Raft event loop. Writes continue uninterrupted while a snapshot is being
compressed and written to disk.
**Committed data is never lost during interrupted transfers.**
If a snapshot transfer is interrupted mid-way (network drop, leader crash, receiver restart),
the stale temporary file is truncated on the next attempt — not appended to — and the
transfer restarts cleanly. The leader retries automatically until the follower catches up.
**Snapshot files are written atomically.**
The receiver assembles chunks into a temporary file (`temp-snapshot.part.tar.gz`) and
performs an atomic rename to the final path only after all chunks pass checksum validation.
A reader never observes a partially written snapshot file.
## Limitations
**P99 write latency may increase during snapshot generation.**
Compressing the state machine to disk is CPU-bound. Under high write throughput,
expect a transient P99 spike of 5–20ms during the compression window.
**Interrupted transfers restart from the beginning.**
d-engine does not support resuming a partial snapshot transfer. If a transfer is
interrupted after transferring 90% of a large snapshot, the next attempt retransfers
from chunk 0. For snapshots exceeding ~500 MB, ensure stable network conditions.
**No cross-datacenter snapshot optimization.**
Differential or incremental snapshot transfer is out of scope. Each transfer is a
full snapshot.
## Operational Boundaries
### When snapshots trigger
A snapshot is triggered when the number of unapplied log entries exceeds
`max_log_entries_before_snapshot`. After the snapshot is created, log entries older
than `retained_log_entries` before the snapshot index are purged.
Any follower whose `next_index` falls below the purge boundary will receive a full
snapshot instead of log entries.
### Chunk timeout
`receive_chunk_timeout_in_sec` (default: 30s) applies per-chunk on the receiver side.
For slow networks or chunks larger than the default 1 KB, increase this value:
```toml
[raft.snapshot]
receive_chunk_timeout_in_sec = 60
```
If this timeout fires, the receiver aborts the current transfer and the leader retries.
### Snapshot retention
`cleanup_retain_count` (default: 2) controls how many past snapshot files are kept on
disk after a new one is created. Keep at least 2 for rollback and debugging headroom.
## Configuration Reference
| `max_log_entries_before_snapshot` | 1000 | Log entries before snapshot triggers |
| `retained_log_entries` | 1 | Log entries to retain after snapshot |
| `chunk_size` | 1024 (1 KB) | Size of each transfer chunk in bytes |
| `receive_chunk_timeout_in_sec` | 30 | Per-chunk receive timeout on follower |
| `transfer_timeout_in_sec` | 600 | Overall transfer timeout |
| `max_bandwidth_mbps` | 1 | Transfer bandwidth cap (Mbps) |
| `cleanup_retain_count` | 2 | Number of past snapshot files to keep |
## Further Reading
- Consistency model: [`consistency_tuning`](crate::docs::server_guide::consistency_tuning)