Forge Orchestration
Rust-Native Orchestration Platform for Distributed Workloads
A high-performance orchestration platform for Rust, designed to manage distributed workloads at hyper-scale. 10-2000x faster scheduling than Kubernetes with intelligent bin-packing for optimal resource utilization.
Performance Benchmarks
| Scale | K8s Baseline | Forge Standard | Forge Optimized | Forge Batch |
|---|---|---|---|---|
| 100 nodes | 500/sec | 40,013/sec (80x) | 27,642/sec (55x) | 1,007,271/sec (2014x) |
| 500 nodes | 467/sec | 10,991/sec (24x) | 17,381/sec (37x) | 236,674/sec (506x) |
| 1000 nodes | 427/sec | 6,097/sec (14x) | 13,483/sec (32x) | 122,331/sec (287x) |
| 5000 nodes | 100/sec | 2,453/sec (25x) | 8,599/sec (86x) | 24,889/sec (249x) |
K8s baseline from Kubernetes Scheduling Framework documentation
Key Performance Innovations
- Lock-free parallel scoring with Rayon for concurrent node evaluation
- Pre-computed score caches for O(1) node lookups
- Integer-only scoring - no floating point in hot paths
- Batch scheduling for amortized overhead (up to 1M decisions/sec)
- First-Fit Decreasing bin-packing for optimal utilization
Features
| Feature | Description |
|---|---|
| High-Performance Scheduler | 10-2000x faster than K8s with bin-packing, spread, GPU-locality algorithms |
| Control Plane | Kubernetes-style API server with admission controllers and watch streams |
| Multi-Region Federation | Geo-aware routing, cross-region replication, latency-based failover |
| MoE Routing | Intelligent request routing with learned, GPU-aware, and version-aware strategies |
| Autoscaling | Threshold-based and target-utilization policies with hysteresis |
| Resilience | Circuit breakers, exponential backoff retry, graceful degradation |
| Game Server SDK | UDP/TCP port allocation, session management, spot instance handling |
| AI/ML Inference | Request batching, SSE streaming for LLM tokens |
Installation
[]
= "0.4.0"
= { = "1", = ["full"] }
Quick Start
Control Plane
use ;
async
Workload SDK
The SDK is included in the main crate under forge_orchestration::sdk:
use ;
async
Architecture
[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
|
v
[Forge Control Plane]
- Tokio Runtime (async loops)
- Rayon (parallel alloc)
- Raft (consensus)
- State: RocksDB (local) + etcd (distributed)
- MoE Router (gating to experts)
|
v
[Nomad Scheduler] (jobs: containers/binaries)
|
v
[Workers/Nodes]
- QUIC/TLS Networking
- Prometheus Metrics
API Reference
Modules
| Module | Description |
|---|---|
job |
Job, Task, TaskGroup, Driver definitions |
moe |
MoERouter trait, DefaultMoERouter, LoadAwareMoERouter, RoundRobinMoERouter |
autoscaler |
Autoscaler, AutoscalerConfig, ScalingPolicy trait |
nomad |
NomadClient for HashiCorp Nomad API |
storage |
StateStore trait, MemoryStore, FileStore |
networking |
HttpServer, QuicTransport |
metrics |
ForgeMetrics, MetricsExporter, MetricsHook trait |
sdk |
Workload SDK: ready(), allocate_port(), graceful_shutdown(), ForgeClient |
MoE Routing
Built-in routers:
DefaultMoERouter: Hash-based consistent routingLoadAwareMoERouter: Routes to least-loaded expert with affinityRoundRobinMoERouter: Sequential distribution
Custom router:
use ;
use async_trait;
;
Autoscaling
use AutoscalerConfig;
let config = default
.upscale_threshold
.downscale_threshold
.hysteresis_secs
.bounds;
Storage
use ;
let memory = new;
let file = open?;
Metrics
use ForgeMetrics;
let metrics = new?;
metrics.record_job_submitted;
metrics.record_scale_event;
let text = metrics.gather_text?;
SDK Functions
| Function | Description |
|---|---|
sdk::ready() |
Signal readiness to orchestrator |
sdk::allocate_port(range) |
Allocate an available port from range |
sdk::release_port(port) |
Release an allocated port |
sdk::graceful_shutdown() |
Install SIGTERM/SIGINT handlers |
sdk::shutdown_signal() |
Async wait for shutdown signal |
sdk::ForgeClient |
HTTP client for Forge API |
Environment Variables
| Variable | Description |
|---|---|
FORGE_API |
Forge API endpoint for SDK |
FORGE_ALLOC_ID |
Allocation ID (set by orchestrator) |
FORGE_TASK_NAME |
Task name (set by orchestrator) |
Builder Configuration
use ForgeBuilder;
new
.with_nomad_api
.with_nomad_token
.with_store_path
.with_node_name
.with_datacenter
.with_autoscaler
.with_metrics
.build?
License
Apache 2.0