forge-orchestration 0.2.0

Rust-native orchestration platform for distributed workloads with MoE routing, autoscaling, and Nomad integration
Documentation

Forge Orchestration

Rust-Native Orchestration Platform for Distributed Workloads

Crates.io Documentation License Rust

A lightweight, high-performance orchestration crate for Rust, designed to manage distributed workloads at hyper-scale with minimal overhead. Integrates with HashiCorp Nomad for scheduling while providing Mixture of Experts (MoE) routing, predictive autoscaling, and QUIC-based networking.

Features

Feature Description
Job Management Define jobs with task groups, drivers, resources, health checks
MoE Routing Intelligent request routing with hash-based, load-aware, and round-robin strategies
Autoscaling Threshold-based and target-utilization policies with hysteresis
Nomad Integration Submit, scale, and stop jobs via HashiCorp Nomad HTTP API
Storage Pluggable state backends (MemoryStore, FileStore)
Networking QUIC transport (quinn) and HTTP server (axum)
Metrics Prometheus-compatible metrics export
SDK Embedded SDK for workloads: lifecycle, port allocation, heartbeats

Installation

[dependencies]

forge-orchestration = "0.1.0"

tokio = { version = "1", features = ["full"] }

Quick Start

Control Plane

use forge_orchestration::{ForgeBuilder, AutoscalerConfig, Job, Task, Driver};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Build the orchestrator
    let forge = ForgeBuilder::new()
        .with_autoscaler(AutoscalerConfig::default()
            .upscale_threshold(0.8)
            .downscale_threshold(0.3))
        .build()?;

    // Define and submit a job
    let job = Job::new("my-service")
        .with_group("api", Task::new("server")
            .driver(Driver::Exec)
            .command("/usr/bin/server")
            .args(vec!["--port", "8080"])
            .resources(500, 256));

    forge.submit_job(job).await?;

    // Run the control plane
    forge.run().await?;
    Ok(())
}

Workload SDK

The SDK is included in the main crate under forge_orchestration::sdk:

use forge_orchestration::sdk::{ready, allocate_port, graceful_shutdown, shutdown_signal};

#[tokio::main]
async fn main() -> forge_orchestration::Result<()> {
    // Signal readiness to orchestrator
    ready()?;

    // Allocate a port dynamically
    let port = allocate_port(8000..9000)?;
    println!("Listening on port {}", port);

    // Install graceful shutdown handlers
    graceful_shutdown();

    // ... your server logic ...

    // Wait for shutdown signal
    shutdown_signal().await;
    Ok(())
}

Architecture

[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
              |
              v
[Forge Control Plane]
  - Tokio Runtime (async loops)
  - Rayon (parallel alloc)
  - Raft (consensus)
  - State: RocksDB (local) + etcd (distributed)
  - MoE Router (gating to experts)
  |
  v
[Nomad Scheduler] (jobs: containers/binaries)
  |
  v
[Workers/Nodes]
  - QUIC/TLS Networking
  - Prometheus Metrics

API Reference

Modules

Module Description
job Job, Task, TaskGroup, Driver definitions
moe MoERouter trait, DefaultMoERouter, LoadAwareMoERouter, RoundRobinMoERouter
autoscaler Autoscaler, AutoscalerConfig, ScalingPolicy trait
nomad NomadClient for HashiCorp Nomad API
storage StateStore trait, MemoryStore, FileStore
networking HttpServer, QuicTransport
metrics ForgeMetrics, MetricsExporter, MetricsHook trait
sdk Workload SDK: ready(), allocate_port(), graceful_shutdown(), ForgeClient

MoE Routing

Built-in routers:

  • DefaultMoERouter: Hash-based consistent routing
  • LoadAwareMoERouter: Routes to least-loaded expert with affinity
  • RoundRobinMoERouter: Sequential distribution

Custom router:

use forge_orchestration::moe::{MoERouter, RouteResult};
use async_trait::async_trait;

struct MyRouter;

#[async_trait]
impl MoERouter for MyRouter {
    async fn route(&self, input: &str, num_experts: usize) -> RouteResult {
        RouteResult::new(input.len() % num_experts)
    }
    fn name(&self) -> &str { "my-router" }
}

Autoscaling

use forge_orchestration::AutoscalerConfig;

let config = AutoscalerConfig::default()
    .upscale_threshold(0.8)
    .downscale_threshold(0.3)
    .hysteresis_secs(300)
    .bounds(1, 100);

Storage

use forge_orchestration::storage::{MemoryStore, FileStore};

let memory = MemoryStore::new();
let file = FileStore::open("/var/lib/forge/state.json")?;

Metrics

use forge_orchestration::ForgeMetrics;

let metrics = ForgeMetrics::new()?;
metrics.record_job_submitted();
metrics.record_scale_event("my-job", "up");
let text = metrics.gather_text()?;

SDK Functions

Function Description
sdk::ready() Signal readiness to orchestrator
sdk::allocate_port(range) Allocate an available port from range
sdk::release_port(port) Release an allocated port
sdk::graceful_shutdown() Install SIGTERM/SIGINT handlers
sdk::shutdown_signal() Async wait for shutdown signal
sdk::ForgeClient HTTP client for Forge API

Environment Variables

Variable Description
FORGE_API Forge API endpoint for SDK
FORGE_ALLOC_ID Allocation ID (set by orchestrator)
FORGE_TASK_NAME Task name (set by orchestrator)

Builder Configuration

use forge_orchestration::ForgeBuilder;

ForgeBuilder::new()
    .with_nomad_api("http://localhost:4646")
    .with_nomad_token("secret-token")
    .with_store_path("/var/lib/forge/state.json")
    .with_node_name("forge-1")
    .with_datacenter("dc1")
    .with_autoscaler(AutoscalerConfig::default())
    .with_metrics(true)
    .build()?

License

Apache 2.0