# Forge Orchestration
**Rust-Native Orchestration Platform for Distributed Workloads**
[](https://crates.io/crates/forge-orchestration)
[](https://docs.rs/forge-orchestration)
[](LICENSE)
[](https://www.rust-lang.org)
A lightweight, high-performance orchestration crate for Rust, designed to manage distributed workloads at hyper-scale with minimal overhead. Integrates with HashiCorp Nomad for scheduling while providing Mixture of Experts (MoE) routing, predictive autoscaling, and QUIC-based networking.
## Features
| **Job Management** | Define jobs with task groups, drivers, resources, health checks |
| **MoE Routing** | Intelligent request routing with hash-based, load-aware, and round-robin strategies |
| **Autoscaling** | Threshold-based and target-utilization policies with hysteresis |
| **Nomad Integration** | Submit, scale, and stop jobs via HashiCorp Nomad HTTP API |
| **Storage** | Pluggable state backends (MemoryStore, FileStore) |
| **Networking** | QUIC transport (quinn) and HTTP server (axum) |
| **Metrics** | Prometheus-compatible metrics export |
| **SDK** | Embedded SDK for workloads: lifecycle, port allocation, heartbeats |
## Installation
```toml
[dependencies]
forge-orchestration = "0.1.0"
tokio = { version = "1", features = ["full"] }
```
## Quick Start
### Control Plane
```rust
use forge_orchestration::{ForgeBuilder, AutoscalerConfig, Job, Task, Driver};
#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
// Build the orchestrator
let forge = ForgeBuilder::new()
.with_autoscaler(AutoscalerConfig::default()
.upscale_threshold(0.8)
.downscale_threshold(0.3))
.build()?;
// Define and submit a job
let job = Job::new("my-service")
.with_group("api", Task::new("server")
.driver(Driver::Exec)
.command("/usr/bin/server")
.args(vec!["--port", "8080"])
.resources(500, 256));
forge.submit_job(job).await?;
// Run the control plane
forge.run().await?;
Ok(())
}
```
### Workload SDK
The SDK is included in the main crate under `forge_orchestration::sdk`:
```rust
use forge_orchestration::sdk::{ready, allocate_port, graceful_shutdown, shutdown_signal};
#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
// Signal readiness to orchestrator
ready()?;
// Allocate a port dynamically
let port = allocate_port(8000..9000)?;
println!("Listening on port {}", port);
// Install graceful shutdown handlers
graceful_shutdown();
// ... your server logic ...
// Wait for shutdown signal
shutdown_signal().await;
Ok(())
}
```
## Architecture
```
[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
|
v
[Forge Control Plane]
- Tokio Runtime (async loops)
- Rayon (parallel alloc)
- Raft (consensus)
- State: RocksDB (local) + etcd (distributed)
- MoE Router (gating to experts)
|
v
[Nomad Scheduler] (jobs: containers/binaries)
|
v
[Workers/Nodes]
- QUIC/TLS Networking
- Prometheus Metrics
```
## API Reference
### Modules
| Module | Description |
|--------|-------------|
| `job` | `Job`, `Task`, `TaskGroup`, `Driver` definitions |
| `moe` | `MoERouter` trait, `DefaultMoERouter`, `LoadAwareMoERouter`, `RoundRobinMoERouter` |
| `autoscaler` | `Autoscaler`, `AutoscalerConfig`, `ScalingPolicy` trait |
| `nomad` | `NomadClient` for HashiCorp Nomad API |
| `storage` | `StateStore` trait, `MemoryStore`, `FileStore` |
| `networking` | `HttpServer`, `QuicTransport` |
| `metrics` | `ForgeMetrics`, `MetricsExporter`, `MetricsHook` trait |
| `sdk` | Workload SDK: `ready()`, `allocate_port()`, `graceful_shutdown()`, `ForgeClient` |
### MoE Routing
Built-in routers:
- **`DefaultMoERouter`**: Hash-based consistent routing
- **`LoadAwareMoERouter`**: Routes to least-loaded expert with affinity
- **`RoundRobinMoERouter`**: Sequential distribution
Custom router:
```rust
use forge_orchestration::moe::{MoERouter, RouteResult};
use async_trait::async_trait;
struct MyRouter;
#[async_trait]
impl MoERouter for MyRouter {
async fn route(&self, input: &str, num_experts: usize) -> RouteResult {
RouteResult::new(input.len() % num_experts)
}
fn name(&self) -> &str { "my-router" }
}
```
### Autoscaling
```rust
use forge_orchestration::AutoscalerConfig;
let config = AutoscalerConfig::default()
.upscale_threshold(0.8)
.downscale_threshold(0.3)
.hysteresis_secs(300)
.bounds(1, 100);
```
### Storage
```rust
use forge_orchestration::storage::{MemoryStore, FileStore};
let memory = MemoryStore::new();
let file = FileStore::open("/var/lib/forge/state.json")?;
```
### Metrics
```rust
use forge_orchestration::ForgeMetrics;
let metrics = ForgeMetrics::new()?;
metrics.record_job_submitted();
metrics.record_scale_event("my-job", "up");
let text = metrics.gather_text()?;
```
### SDK Functions
| `sdk::ready()` | Signal readiness to orchestrator |
| `sdk::allocate_port(range)` | Allocate an available port from range |
| `sdk::release_port(port)` | Release an allocated port |
| `sdk::graceful_shutdown()` | Install SIGTERM/SIGINT handlers |
| `sdk::shutdown_signal()` | Async wait for shutdown signal |
| `sdk::ForgeClient` | HTTP client for Forge API |
## Environment Variables
| `FORGE_API` | Forge API endpoint for SDK |
| `FORGE_ALLOC_ID` | Allocation ID (set by orchestrator) |
| `FORGE_TASK_NAME` | Task name (set by orchestrator) |
## Builder Configuration
```rust
use forge_orchestration::ForgeBuilder;
ForgeBuilder::new()
.with_nomad_api("http://localhost:4646")
.with_nomad_token("secret-token")
.with_store_path("/var/lib/forge/state.json")
.with_node_name("forge-1")
.with_datacenter("dc1")
.with_autoscaler(AutoscalerConfig::default())
.with_metrics(true)
.build()?
```
## License
Apache 2.0