# Forge Orchestration
**Rust-Native Orchestration Platform for Distributed Workloads**
[](https://crates.io/crates/forge-orchestration)
[](https://docs.rs/forge-orchestration)
[](LICENSE)
[](https://www.rust-lang.org)
A high-performance orchestration platform for Rust, designed to manage distributed workloads at hyper-scale. **10-2000x faster scheduling than Kubernetes** with intelligent bin-packing for optimal resource utilization.
## Performance Benchmarks
| 100 nodes | 500/sec | 40,013/sec (**80x**) | 27,642/sec (**55x**) | 1,007,271/sec (**2014x**) |
| 500 nodes | 467/sec | 10,991/sec (**24x**) | 17,381/sec (**37x**) | 236,674/sec (**506x**) |
| 1000 nodes | 427/sec | 6,097/sec (**14x**) | 13,483/sec (**32x**) | 122,331/sec (**287x**) |
| 5000 nodes | 100/sec | 2,453/sec (**25x**) | 8,599/sec (**86x**) | 24,889/sec (**249x**) |
*K8s baseline from [Kubernetes Scheduling Framework documentation](https://kubernetes.io/blog/2019/04/25/scheduling-framework/)*
### Key Performance Innovations
- **Lock-free parallel scoring** with Rayon for concurrent node evaluation
- **Pre-computed score caches** for O(1) node lookups
- **Integer-only scoring** - no floating point in hot paths
- **Batch scheduling** for amortized overhead (up to 1M decisions/sec)
- **First-Fit Decreasing bin-packing** for optimal utilization
## Features
| **High-Performance Scheduler** | 10-2000x faster than K8s with bin-packing, spread, GPU-locality algorithms |
| **Control Plane** | Kubernetes-style API server with admission controllers and watch streams |
| **Multi-Region Federation** | Geo-aware routing, cross-region replication, latency-based failover |
| **MoE Routing** | Intelligent request routing with learned, GPU-aware, and version-aware strategies |
| **Autoscaling** | Threshold-based and target-utilization policies with hysteresis |
| **Resilience** | Circuit breakers, exponential backoff retry, graceful degradation |
| **Game Server SDK** | UDP/TCP port allocation, session management, spot instance handling |
| **AI/ML Inference** | Request batching, SSE streaming for LLM tokens |
## Installation
```toml
[dependencies]
forge-orchestration = "0.4.0"
tokio = { version = "1", features = ["full"] }
```
## Quick Start
### Control Plane
```rust
use forge_orchestration::{ForgeBuilder, AutoscalerConfig, Job, Task, Driver};
#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
// Build the orchestrator
let forge = ForgeBuilder::new()
.with_autoscaler(AutoscalerConfig::default()
.upscale_threshold(0.8)
.downscale_threshold(0.3))
.build()?;
// Define and submit a job
let job = Job::new("my-service")
.with_group("api", Task::new("server")
.driver(Driver::Exec)
.command("/usr/bin/server")
.args(vec!["--port", "8080"])
.resources(500, 256));
forge.submit_job(job).await?;
// Run the control plane
forge.run().await?;
Ok(())
}
```
### Workload SDK
The SDK is included in the main crate under `forge_orchestration::sdk`:
```rust
use forge_orchestration::sdk::{ready, allocate_port, graceful_shutdown, shutdown_signal};
#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
// Signal readiness to orchestrator
ready()?;
// Allocate a port dynamically
let port = allocate_port(8000..9000)?;
println!("Listening on port {}", port);
// Install graceful shutdown handlers
graceful_shutdown();
// ... your server logic ...
// Wait for shutdown signal
shutdown_signal().await;
Ok(())
}
```
## Architecture
```
[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
|
v
[Forge Control Plane]
- Tokio Runtime (async loops)
- Rayon (parallel alloc)
- Raft (consensus)
- State: RocksDB (local) + etcd (distributed)
- MoE Router (gating to experts)
|
v
[Nomad Scheduler] (jobs: containers/binaries)
|
v
[Workers/Nodes]
- QUIC/TLS Networking
- Prometheus Metrics
```
## API Reference
### Modules
| Module | Description |
|--------|-------------|
| `job` | `Job`, `Task`, `TaskGroup`, `Driver` definitions |
| `moe` | `MoERouter` trait, `DefaultMoERouter`, `LoadAwareMoERouter`, `RoundRobinMoERouter` |
| `autoscaler` | `Autoscaler`, `AutoscalerConfig`, `ScalingPolicy` trait |
| `nomad` | `NomadClient` for HashiCorp Nomad API |
| `storage` | `StateStore` trait, `MemoryStore`, `FileStore` |
| `networking` | `HttpServer`, `QuicTransport` |
| `metrics` | `ForgeMetrics`, `MetricsExporter`, `MetricsHook` trait |
| `sdk` | Workload SDK: `ready()`, `allocate_port()`, `graceful_shutdown()`, `ForgeClient` |
### MoE Routing
Built-in routers:
- **`DefaultMoERouter`**: Hash-based consistent routing
- **`LoadAwareMoERouter`**: Routes to least-loaded expert with affinity
- **`RoundRobinMoERouter`**: Sequential distribution
Custom router:
```rust
use forge_orchestration::moe::{MoERouter, RouteResult};
use async_trait::async_trait;
struct MyRouter;
#[async_trait]
impl MoERouter for MyRouter {
async fn route(&self, input: &str, num_experts: usize) -> RouteResult {
RouteResult::new(input.len() % num_experts)
}
fn name(&self) -> &str { "my-router" }
}
```
### Autoscaling
```rust
use forge_orchestration::AutoscalerConfig;
let config = AutoscalerConfig::default()
.upscale_threshold(0.8)
.downscale_threshold(0.3)
.hysteresis_secs(300)
.bounds(1, 100);
```
### Storage
```rust
use forge_orchestration::storage::{MemoryStore, FileStore};
let memory = MemoryStore::new();
let file = FileStore::open("/var/lib/forge/state.json")?;
```
### Metrics
```rust
use forge_orchestration::ForgeMetrics;
let metrics = ForgeMetrics::new()?;
metrics.record_job_submitted();
metrics.record_scale_event("my-job", "up");
let text = metrics.gather_text()?;
```
### SDK Functions
| `sdk::ready()` | Signal readiness to orchestrator |
| `sdk::allocate_port(range)` | Allocate an available port from range |
| `sdk::release_port(port)` | Release an allocated port |
| `sdk::graceful_shutdown()` | Install SIGTERM/SIGINT handlers |
| `sdk::shutdown_signal()` | Async wait for shutdown signal |
| `sdk::ForgeClient` | HTTP client for Forge API |
## Environment Variables
| `FORGE_API` | Forge API endpoint for SDK |
| `FORGE_ALLOC_ID` | Allocation ID (set by orchestrator) |
| `FORGE_TASK_NAME` | Task name (set by orchestrator) |
## Builder Configuration
```rust
use forge_orchestration::ForgeBuilder;
ForgeBuilder::new()
.with_nomad_api("http://localhost:4646")
.with_nomad_token("secret-token")
.with_store_path("/var/lib/forge/state.json")
.with_node_name("forge-1")
.with_datacenter("dc1")
.with_autoscaler(AutoscalerConfig::default())
.with_metrics(true)
.build()?
```
## License
Apache 2.0