LLM Orchestrator State
State persistence and recovery for the LLM workflow orchestrator.
Features
- Database Backends: PostgreSQL (production) and SQLite (development/testing)
- Connection Pooling: Configurable min/max connections for optimal performance
- Automatic Checkpointing: Create checkpoints after each step for crash recovery
- Transaction Support: Atomic state updates with rollback capability
- Workflow Recovery: Resume workflows from last checkpoint after crashes
- Automatic Cleanup: Retain last N checkpoints per workflow (configurable)
Installation
Add to your Cargo.toml:
[]
= "0.1"
= { = "0.8", = ["runtime-tokio-rustls", "postgres"] } # For PostgreSQL
# OR
= { = "0.8", = ["runtime-tokio-rustls", "sqlite"] } # For SQLite
Usage
PostgreSQL (Production)
use ;
use json;
async
SQLite (Development/Testing)
use ;
async
Recovery After Crash
use ;
async
Cleanup Old Data
use ;
use ;
async
Database Schema
Workflow States Table
(
id UUID PRIMARY KEY,
workflow_id VARCHAR(255) NOT NULL,
workflow_name VARCHAR(255) NOT NULL,
status VARCHAR(50) NOT NULL, -- pending, running, paused, completed, failed
user_id VARCHAR(255),
started_at TIMESTAMP WITH TIME ZONE NOT NULL,
updated_at TIMESTAMP WITH TIME ZONE NOT NULL,
completed_at TIMESTAMP WITH TIME ZONE,
context TEXT NOT NULL, -- JSON
error TEXT
);
Step States Table
(
workflow_state_id UUID NOT NULL,
step_id VARCHAR(255) NOT NULL,
status VARCHAR(50) NOT NULL,
started_at TIMESTAMP WITH TIME ZONE,
completed_at TIMESTAMP WITH TIME ZONE,
outputs TEXT, -- JSON
error TEXT,
retry_count INTEGER DEFAULT 0,
PRIMARY KEY (workflow_state_id, step_id)
);
Checkpoints Table
(
id UUID PRIMARY KEY,
workflow_state_id UUID NOT NULL,
step_id VARCHAR(255) NOT NULL,
timestamp TIMESTAMP WITH TIME ZONE NOT NULL,
snapshot TEXT NOT NULL -- JSON
);
Performance Characteristics
PostgreSQL
- State Save Latency: < 50ms (P99) with connection pooling
- State Load Latency: < 30ms (P99)
- Concurrent Workflows: 10,000+ active workflows
- Connection Pool: 5-20 connections (configurable)
- Checkpoint Overhead: < 100ms per checkpoint
SQLite
- State Save Latency: < 20ms (P99) for file-based, < 5ms for in-memory
- State Load Latency: < 10ms (P99)
- Concurrent Workflows: 1,000+ (limited by single-writer constraint)
- Best For: Development, testing, single-node deployments
Configuration
PostgreSQL Connection String Format
postgresql://[user[:password]@][host][:port][/database][?param1=value1&...]
Example with SSL:
postgresql://user:pass@localhost:5432/workflows?sslmode=require
SQLite Path Format
./relative/path/to/file.db
/absolute/path/to/file.db
:memory: # In-memory database
Testing
Run unit tests:
Run integration tests with PostgreSQL:
TEST_DATABASE_URL=postgresql://localhost/test_workflows
Thread Safety
All state store implementations are:
Send + Sync- Safe to share across threads- Thread-safe for concurrent reads and writes
- Use internal locking/connection pooling
Error Handling
use ;
match store.load_workflow_state.await
License
MIT OR Apache-2.0