Forge Orchestration
Rust-Native Orchestration Platform for Distributed Workloads
A lightweight, high-performance orchestration crate for Rust, designed to manage distributed workloads at hyper-scale with minimal overhead. Integrates with HashiCorp Nomad for scheduling while providing Mixture of Experts (MoE) routing, predictive autoscaling, and QUIC-based networking.
Features
| Feature | Description |
|---|---|
| Job Management | Define jobs with task groups, drivers, resources, health checks |
| MoE Routing | Intelligent request routing with hash-based, load-aware, and round-robin strategies |
| Autoscaling | Threshold-based and target-utilization policies with hysteresis |
| Nomad Integration | Submit, scale, and stop jobs via HashiCorp Nomad HTTP API |
| Storage | Pluggable state backends (MemoryStore, FileStore) |
| Networking | QUIC transport (quinn) and HTTP server (axum) |
| Metrics | Prometheus-compatible metrics export |
| SDK | Embedded SDK for workloads: lifecycle, port allocation, heartbeats |
Installation
[]
= "0.1.0"
= { = "1", = ["full"] }
Quick Start
Control Plane
use ;
async
Workload SDK
The SDK is included in the main crate under forge_orchestration::sdk:
use ;
async
Architecture
[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
|
v
[Forge Control Plane]
- Tokio Runtime (async loops)
- Rayon (parallel alloc)
- Raft (consensus)
- State: RocksDB (local) + etcd (distributed)
- MoE Router (gating to experts)
|
v
[Nomad Scheduler] (jobs: containers/binaries)
|
v
[Workers/Nodes]
- QUIC/TLS Networking
- Prometheus Metrics
API Reference
Modules
| Module | Description |
|---|---|
job |
Job, Task, TaskGroup, Driver definitions |
moe |
MoERouter trait, DefaultMoERouter, LoadAwareMoERouter, RoundRobinMoERouter |
autoscaler |
Autoscaler, AutoscalerConfig, ScalingPolicy trait |
nomad |
NomadClient for HashiCorp Nomad API |
storage |
StateStore trait, MemoryStore, FileStore |
networking |
HttpServer, QuicTransport |
metrics |
ForgeMetrics, MetricsExporter, MetricsHook trait |
sdk |
Workload SDK: ready(), allocate_port(), graceful_shutdown(), ForgeClient |
MoE Routing
Built-in routers:
DefaultMoERouter: Hash-based consistent routingLoadAwareMoERouter: Routes to least-loaded expert with affinityRoundRobinMoERouter: Sequential distribution
Custom router:
use ;
use async_trait;
;
Autoscaling
use AutoscalerConfig;
let config = default
.upscale_threshold
.downscale_threshold
.hysteresis_secs
.bounds;
Storage
use ;
let memory = new;
let file = open?;
Metrics
use ForgeMetrics;
let metrics = new?;
metrics.record_job_submitted;
metrics.record_scale_event;
let text = metrics.gather_text?;
SDK Functions
| Function | Description |
|---|---|
sdk::ready() |
Signal readiness to orchestrator |
sdk::allocate_port(range) |
Allocate an available port from range |
sdk::release_port(port) |
Release an allocated port |
sdk::graceful_shutdown() |
Install SIGTERM/SIGINT handlers |
sdk::shutdown_signal() |
Async wait for shutdown signal |
sdk::ForgeClient |
HTTP client for Forge API |
Environment Variables
| Variable | Description |
|---|---|
FORGE_API |
Forge API endpoint for SDK |
FORGE_ALLOC_ID |
Allocation ID (set by orchestrator) |
FORGE_TASK_NAME |
Task name (set by orchestrator) |
Builder Configuration
use ForgeBuilder;
new
.with_nomad_api
.with_nomad_token
.with_store_path
.with_node_name
.with_datacenter
.with_autoscaler
.with_metrics
.build?
License
Apache 2.0