Forge Orchestration

Rust-Native Orchestration Platform for Distributed Workloads

A lightweight, high-performance orchestration crate for Rust, designed to manage distributed workloads at hyper-scale with minimal overhead. Integrates with HashiCorp Nomad for scheduling while providing Mixture of Experts (MoE) routing, predictive autoscaling, and QUIC-based networking.

Features

Feature	Description
Job Management	Define jobs with task groups, drivers, resources, health checks
MoE Routing	Intelligent request routing with hash-based, load-aware, and round-robin strategies
Autoscaling	Threshold-based and target-utilization policies with hysteresis
Nomad Integration	Submit, scale, and stop jobs via HashiCorp Nomad HTTP API
Storage	Pluggable state backends (MemoryStore, FileStore)
Networking	QUIC transport (quinn) and HTTP server (axum)
Metrics	Prometheus-compatible metrics export
SDK	Embedded SDK for workloads: lifecycle, port allocation, heartbeats

Installation

[dependencies]

forge-orchestration = "0.1.0"

tokio = { version = "1", features = ["full"] }

Quick Start

Control Plane

use forge_orchestration::{ForgeBuilder, AutoscalerConfig, Job, Task, Driver};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Build the orchestrator
    let forge = ForgeBuilder::new()
        .with_autoscaler(AutoscalerConfig::default()
            .upscale_threshold(0.8)
            .downscale_threshold(0.3))
        .build()?;

    // Define and submit a job
    let job = Job::new("my-service")
        .with_group("api", Task::new("server")
            .driver(Driver::Exec)
            .command("/usr/bin/server")
            .args(vec!["--port", "8080"])
            .resources(500, 256));

    forge.submit_job(job).await?;

    // Run the control plane
    forge.run().await?;
    Ok(())
}

Workload SDK

The SDK is included in the main crate under forge_orchestration::sdk:

use forge_orchestration::sdk::{ready, allocate_port, graceful_shutdown, shutdown_signal};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Signal readiness to orchestrator
    ready()?;

    // Allocate a port dynamically
    let port = allocate_port(8000..9000)?;
    println!("Listening on port {}", port);

    // Install graceful shutdown handlers
    graceful_shutdown();

    // ... your server logic ...

    // Wait for shutdown signal
    shutdown_signal().await;
    Ok(())
}

Architecture

[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
              |
              v
[Forge Control Plane]
  - Tokio Runtime (async loops)
  - Rayon (parallel alloc)
  - Raft (consensus)
  - State: RocksDB (local) + etcd (distributed)
  - MoE Router (gating to experts)
  |
  v
[Nomad Scheduler] (jobs: containers/binaries)
  |
  v
[Workers/Nodes]
  - QUIC/TLS Networking
  - Prometheus Metrics

API Reference

Modules

Module	Description
`job`	`Job`, `Task`, `TaskGroup`, `Driver` definitions
`moe`	`MoERouter` trait, `DefaultMoERouter`, `LoadAwareMoERouter`, `RoundRobinMoERouter`
`autoscaler`	`Autoscaler`, `AutoscalerConfig`, `ScalingPolicy` trait
`nomad`	`NomadClient` for HashiCorp Nomad API
`storage`	`StateStore` trait, `MemoryStore`, `FileStore`
`networking`	`HttpServer`, `QuicTransport`
`metrics`	`ForgeMetrics`, `MetricsExporter`, `MetricsHook` trait
`sdk`	Workload SDK: `ready()`, `allocate_port()`, `graceful_shutdown()`, `ForgeClient`

MoE Routing

Built-in routers:

DefaultMoERouter: Hash-based consistent routing
LoadAwareMoERouter: Routes to least-loaded expert with affinity
RoundRobinMoERouter: Sequential distribution

Custom router:

use forge_orchestration::moe::{MoERouter, RouteResult};
use async_trait::async_trait;

struct MyRouter;

#[async_trait]
impl MoERouter for MyRouter {
    async fn route(&self, input: &str, num_experts: usize) -> RouteResult {
        RouteResult::new(input.len() % num_experts)
    }
    fn name(&self) -> &str { "my-router" }
}

Autoscaling

use forge_orchestration::AutoscalerConfig;

let config = AutoscalerConfig::default()
    .upscale_threshold(0.8)
    .downscale_threshold(0.3)
    .hysteresis_secs(300)
    .bounds(1, 100);

Storage

use forge_orchestration::storage::{MemoryStore, FileStore};

let memory = MemoryStore::new();
let file = FileStore::open("/var/lib/forge/state.json")?;

Metrics

use forge_orchestration::ForgeMetrics;

let metrics = ForgeMetrics::new()?;
metrics.record_job_submitted();
metrics.record_scale_event("my-job", "up");
let text = metrics.gather_text()?;

SDK Functions

Function	Description
`sdk::ready()`	Signal readiness to orchestrator
`sdk::allocate_port(range)`	Allocate an available port from range
`sdk::release_port(port)`	Release an allocated port
`sdk::graceful_shutdown()`	Install SIGTERM/SIGINT handlers
`sdk::shutdown_signal()`	Async wait for shutdown signal
`sdk::ForgeClient`	HTTP client for Forge API

Environment Variables

Variable	Description
`FORGE_API`	Forge API endpoint for SDK
`FORGE_ALLOC_ID`	Allocation ID (set by orchestrator)
`FORGE_TASK_NAME`	Task name (set by orchestrator)

Builder Configuration

use forge_orchestration::ForgeBuilder;

ForgeBuilder::new()
    .with_nomad_api("http://localhost:4646")
    .with_nomad_token("secret-token")
    .with_store_path("/var/lib/forge/state.json")
    .with_node_name("forge-1")
    .with_datacenter("dc1")
    .with_autoscaler(AutoscalerConfig::default())
    .with_metrics(true)
    .build()?

License

Apache 2.0

forge-orchestration 0.2.0