Cruster
A Rust framework for building distributed, stateful systems with durable workflows.
Status: Under active development. API may change.
Features
- Stateless Entities -- Addressable RPC handlers with automatic sharding and routing. State is your responsibility (use your database directly).
- Durable Workflows -- Long-running orchestrations that survive crashes. Activities are journaled and replay automatically.
- Activity Groups -- Reusable bundles of transactional activities, composable across workflows.
- RPC Groups -- Reusable bundles of RPC methods, composable across entities.
- Singletons -- Cluster-wide unique tasks with automatic failover.
- Cron Scheduling -- Distributed cron jobs running as singletons.
- gRPC Transport -- Inter-node communication with connection pooling and streaming.
Installation
Prerequisites
Protocol Buffers compiler is required:
# Debian/Ubuntu
# macOS
Cargo
[]
= "version"
Quick Start
Defining an Entity
Entities are stateless RPC handlers. You manage state yourself via your database.
use *;
use ;
use PgPool;
Using the Entity
use EntityId;
// Register entity and get a typed client (CounterClient is generated by the macro)
let counter = Counter
.register
.await?;
// Call methods -- entity is created on first access, routed by shard
let entity_id = new;
let value = counter.increment.await?;
assert_eq!;
let value = counter.get.await?;
assert_eq!;
Core Concepts
Entity Method Types
| Attribute | Purpose | Delivery |
|---|---|---|
#[rpc] |
Best-effort queries/reads | At-most-once |
#[rpc(persisted)] |
Durable writes/mutations | At-least-once |
Both use &self -- entities are stateless. State lives in your database.
Entity Configuration
Visibility Modifiers
Durable Workflows
Workflows are standalone durable orchestration constructs backed by hidden entities. They have an execute entry point and #[activity] side effects. Activities are journaled -- on crash recovery, completed activities return their cached results.
Defining a Workflow
use *;
use ;
;
Using a Workflow
// Register and get typed client (OrderWorkflowClient is generated)
let client = OrderWorkflow.register.await?;
// Execute synchronously -- blocks until workflow completes
let result = client.execute.await?;
// Start asynchronously -- returns execution ID immediately
let exec_id = client.start.await?;
// Poll for result later
let result: = client.poll.await?;
// Idempotency keys
let result = client.with_key.execute.await?; // hashed
let result = client.with_key_raw.execute.await?; // used as-is
self.tx -- Transactional Activities
Inside #[activity] methods, self.tx is an ActivityTx that implements sqlx::Executor. All SQL writes through self.tx are committed atomically with the workflow journal entry. If the activity fails, both the journal and your SQL writes roll back together.
async
Activity Retries
async
async
Durable Timers
Activity Groups
Reusable bundles of transactional activities that can be composed into multiple workflows.
Defining Activity Groups
;
;
Composing into a Workflow
;
Registration with Activity Groups
let client = OrderWorkflow
.register
.await?;
RPC Groups
Reusable bundles of RPC methods that can be composed into multiple entities.
Defining an RPC Group
Composing into an Entity
Registration with RPC Groups
// Pass RPC group instances during registration
let client = User
.register
.await?;
// Generated client includes both entity methods and RPC group methods
client.update_email.await?;
client.log_action.await?;
client.get_audit_log.await?;
Singletons
Cluster-wide unique tasks. If the owning node dies, another takes over.
use ;
register_singleton.await?;
If a singleton does not call ctx.cancellation(), it is force-cancelled on shutdown.
Builder API
use singleton;
singleton
.register
.await?;
Cron Jobs
Distributed cron jobs that run as cluster singletons:
use ;
let cron = new
.with_calculate_next_from_previous
.with_skip_if_older_than;
cron.register.await?;
Client Methods
The generated typed clients provide ergonomic methods. The underlying EntityClient supports:
| Method | Delivery | Use Case |
|---|---|---|
send(id, tag, req) |
Best-effort, await reply | Reads |
send_persisted(id, tag, req, uninterruptible) |
At-least-once, await reply | Writes |
notify(id, tag, req) |
Best-effort, fire-and-forget | Events |
notify_persisted(id, tag, req) |
At-least-once, fire-and-forget | Durable events |
send_at(id, tag, req, deliver_at) |
Scheduled delivery | Timers |
notify_at(id, tag, req, deliver_at) |
Scheduled fire-and-forget | Scheduled events |
send_stream(id, tag, req) |
Streaming response | Large result sets |
Production Deployment
Cluster Setup
use ShardingConfig;
use ClusterMetrics;
use ShardingImpl;
use EtcdRunnerStorage;
use SqlMessageStorage;
use SqlWorkflowStorage;
use SqlWorkflowEngine;
use ;
// 1. Storage backends
let pool = new.max_connections.connect.await?;
let message_storage = new;
message_storage.migrate.await?;
let state_storage = new;
let workflow_engine = new;
// 2. Runner discovery via etcd
let etcd_client = connect.await?;
let runner_storage = new;
// 3. gRPC transport
let grpc_runners = new;
let runner_health = new;
// 4. Configure and create
let config = new;
let sharding = new_with_engines?;
// 5. Register entities and workflows
let counter = Counter .register.await?;
let workflow = OrderWorkflow.register.await?;
// 6. Start background loops + gRPC server
sharding.start.await?;
let grpc_server = new;
builder
.add_service
.serve
.await?;
Infrastructure Requirements
| Component | Purpose |
|---|---|
| PostgreSQL | Message storage, workflow journals, activity transactions |
| etcd | Runner discovery, shard locks, health monitoring |
Configuration Reference
| Option | Default | Description |
|---|---|---|
shards_per_group |
300 | Shards per shard group |
entity_max_idle_time |
60s | Idle timeout before eviction |
entity_mailbox_capacity |
100 | Per-entity message queue size |
entity_max_concurrent_requests |
1 | Concurrent requests per entity (0 = unbounded) |
storage_poll_interval |
500ms | Message storage polling frequency |
storage_message_max_retries |
10 | Max delivery attempts before dead-letter |
runner_lock_ttl |
30s | Runner lock TTL in etcd |
send_retry_count |
3 | Retries on routing failures during rebalancing |
singleton_crash_backoff_base |
1s | Base backoff for singleton crash recovery |
detachment_enabled |
false | Enable automatic detachment on storage errors |
Architecture
┌─────────────────────────────────────────────────────────┐
│ Clients │
└─────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ Sharding Layer │
│ Rendezvous hashing + request routing │
└─────────────────────────┬───────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Runner 1 │ │ Runner 2 │ │ Runner 3 │
│ Shards │ │ Shards │ │ Shards │
│ 0-99 │ │ 100-199 │ │ 200-299 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└───────────────┴───────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌───────────┐ ┌───────────┐
│ PostgreSQL│ │ etcd │
│ - Messages│ │ - Runners │
│ - Journals│ │ - Locks │
│ - Tx data │ │ - Health │
└───────────┘ └───────────┘
Examples
See examples/ for a complete working example:
- cluster-tests -- End-to-end test suite covering entities, workflows, RPC groups, activity groups, singletons, and timers.
License
MIT OR Apache-2.0