# Fault Tolerance & Recovery
Handling failures and recovering workflows automatically.
- [Automatic Failure Recovery](./automatic-recovery.md) - Automatic retry and resource adjustment
- [Configurable Failure Handlers](./failure-handlers.md) - Per-job retry logic based on exit codes
- [AI-Assisted Recovery](./ai-assisted-recovery.md) - Intelligent error classification with AI
agents
- [Job Checkpointing](./checkpointing.md) - Saving and restoring job state