forge-orchestration 0.2.0

Rust-native orchestration platform for distributed workloads with MoE routing, autoscaling, and Nomad integration
Documentation
# Forge Orchestration


**Rust-Native Orchestration Platform for Distributed Workloads**

[![Crates.io](https://img.shields.io/crates/v/forge-orchestration.svg)](https://crates.io/crates/forge-orchestration)
[![Documentation](https://docs.rs/forge-orchestration/badge.svg)](https://docs.rs/forge-orchestration)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)
[![Rust](https://img.shields.io/badge/rust-1.75%2B-orange.svg)](https://www.rust-lang.org)

A lightweight, high-performance orchestration crate for Rust, designed to manage distributed workloads at hyper-scale with minimal overhead. Integrates with HashiCorp Nomad for scheduling while providing Mixture of Experts (MoE) routing, predictive autoscaling, and QUIC-based networking.

## Features


| Feature | Description |
|---------|-------------|
| **Job Management** | Define jobs with task groups, drivers, resources, health checks |
| **MoE Routing** | Intelligent request routing with hash-based, load-aware, and round-robin strategies |
| **Autoscaling** | Threshold-based and target-utilization policies with hysteresis |
| **Nomad Integration** | Submit, scale, and stop jobs via HashiCorp Nomad HTTP API |
| **Storage** | Pluggable state backends (MemoryStore, FileStore) |
| **Networking** | QUIC transport (quinn) and HTTP server (axum) |
| **Metrics** | Prometheus-compatible metrics export |
| **SDK** | Embedded SDK for workloads: lifecycle, port allocation, heartbeats |

## Installation


```toml
[dependencies]
forge-orchestration = "0.1.0"
tokio = { version = "1", features = ["full"] }
```

## Quick Start


### Control Plane


```rust
use forge_orchestration::{ForgeBuilder, AutoscalerConfig, Job, Task, Driver};

#[tokio::main]

async fn main() -> forge_orchestration::Result<()> {
    // Build the orchestrator
    let forge = ForgeBuilder::new()
        .with_autoscaler(AutoscalerConfig::default()
            .upscale_threshold(0.8)
            .downscale_threshold(0.3))
        .build()?;

    // Define and submit a job
    let job = Job::new("my-service")
        .with_group("api", Task::new("server")
            .driver(Driver::Exec)
            .command("/usr/bin/server")
            .args(vec!["--port", "8080"])
            .resources(500, 256));

    forge.submit_job(job).await?;

    // Run the control plane
    forge.run().await?;
    Ok(())
}
```

### Workload SDK


The SDK is included in the main crate under `forge_orchestration::sdk`:

```rust
use forge_orchestration::sdk::{ready, allocate_port, graceful_shutdown, shutdown_signal};

#[tokio::main]

async fn main() -> forge_orchestration::Result<()> {
    // Signal readiness to orchestrator
    ready()?;

    // Allocate a port dynamically
    let port = allocate_port(8000..9000)?;
    println!("Listening on port {}", port);

    // Install graceful shutdown handlers
    graceful_shutdown();

    // ... your server logic ...

    // Wait for shutdown signal
    shutdown_signal().await;
    Ok(())
}
```

## Architecture


```
[User App] --> [Forge SDK] (ready(), allocate(), shutdown())
              |
              v
[Forge Control Plane]
  - Tokio Runtime (async loops)
  - Rayon (parallel alloc)
  - Raft (consensus)
  - State: RocksDB (local) + etcd (distributed)
  - MoE Router (gating to experts)
  |
  v
[Nomad Scheduler] jobs: containers/binaries
  |
  v
[Workers/Nodes]
  - QUIC/TLS Networking
  - Prometheus Metrics
```

## API Reference

### Modules

| Module | Description |
|--------|-------------|
| `job` | `Job`, `Task`, `TaskGroup`, `Driver` definitions |
| `moe` | `MoERouter` trait, `DefaultMoERouter`, `LoadAwareMoERouter`, `RoundRobinMoERouter` |
| `autoscaler` | `Autoscaler`, `AutoscalerConfig`, `ScalingPolicy` trait |
| `nomad` | `NomadClient` for HashiCorp Nomad API |
| `storage` | `StateStore` trait, `MemoryStore`, `FileStore` |
| `networking` | `HttpServer`, `QuicTransport` |
| `metrics` | `ForgeMetrics`, `MetricsExporter`, `MetricsHook` trait |
| `sdk` | Workload SDK: `ready()`, `allocate_port()`, `graceful_shutdown()`, `ForgeClient` |

### MoE Routing

Built-in routers:
- **`DefaultMoERouter`**: Hash-based consistent routing
- **`LoadAwareMoERouter`**: Routes to least-loaded expert with affinity
- **`RoundRobinMoERouter`**: Sequential distribution

Custom router:

```rust
use forge_orchestration::moe::{MoERouter, RouteResult};
use async_trait::async_trait;

struct MyRouter;

#[async_trait]
impl MoERouter for MyRouter {
    async fn route(&self, input: &str, num_experts: usize) -> RouteResult {
        RouteResult::new(input.len() % num_experts)
    }
    fn name(&self) -> &str { "my-router" }
}
```

### Autoscaling


```rust
use forge_orchestration::AutoscalerConfig;

let config = AutoscalerConfig::default()
    .upscale_threshold(0.8)
    .downscale_threshold(0.3)
    .hysteresis_secs(300)
    .bounds(1, 100);
```

### Storage


```rust
use forge_orchestration::storage::{MemoryStore, FileStore};

let memory = MemoryStore::new();
let file = FileStore::open("/var/lib/forge/state.json")?;
```

### Metrics


```rust
use forge_orchestration::ForgeMetrics;

let metrics = ForgeMetrics::new()?;
metrics.record_job_submitted();
metrics.record_scale_event("my-job", "up");
let text = metrics.gather_text()?;
```

### SDK Functions


| Function | Description |
|----------|-------------|
| `sdk::ready()` | Signal readiness to orchestrator |
| `sdk::allocate_port(range)` | Allocate an available port from range |
| `sdk::release_port(port)` | Release an allocated port |
| `sdk::graceful_shutdown()` | Install SIGTERM/SIGINT handlers |
| `sdk::shutdown_signal()` | Async wait for shutdown signal |
| `sdk::ForgeClient` | HTTP client for Forge API |

## Environment Variables


| Variable | Description |
|----------|-------------|
| `FORGE_API` | Forge API endpoint for SDK |
| `FORGE_ALLOC_ID` | Allocation ID (set by orchestrator) |
| `FORGE_TASK_NAME` | Task name (set by orchestrator) |

## Builder Configuration


```rust
use forge_orchestration::ForgeBuilder;

ForgeBuilder::new()
    .with_nomad_api("http://localhost:4646")
    .with_nomad_token("secret-token")
    .with_store_path("/var/lib/forge/state.json")
    .with_node_name("forge-1")
    .with_datacenter("dc1")
    .with_autoscaler(AutoscalerConfig::default())
    .with_metrics(true)
    .build()?
```

## License


Apache 2.0