gradatum-engine
Rust supervisor for llama-server inference processes — transparent OpenAI-compatible reverse proxy with restart-on-failure.
Status: Alpha (v0.3.x) — public, Apache-2.0. API not yet stable before v1.0. Part of gradatum — memory backbone for AI agents. · github · gradatum.org
Overview
gradatum-engine manages one or more llama-server child processes, acting as a supervisor
and transparent HTTP proxy. It does not load models itself — it spawns an external
llama-server binary and forwards requests to it, preserving the full OpenAI-compatible
interface including streaming, vision (mmproj), sampling parameters, and slot IDs.
Architecture in v0.3.x:
- Spawn — launches
llama-serverviatokio::process::Command(never via shell). - Wait-ready — polls
GET /healthon the child process until it returns 200. - Transparent reverse proxy — forwards request bodies verbatim to the child; passes
through SSE streams,
slot_id, sampling fields, tool call parameters, and vision inputs without modification. - Supervise — bounded restart-on-failure with configurable retry limit; shuts down gracefully on SIGTERM.
Supports multi-model deployments (one engine instance per model, each on its own port). Bind address is fail-closed: only binds to configured LAN addresses, never open to all interfaces by default.
Usage
Feature Flags
| Feature | Description |
|---|---|
serve (default) |
Compile the Axum HTTP server and llama-server supervisor |
Anti-cycle invariant
gradatum-engine may depend on gradatum-core and gradatum-dto.
gradatum-core and gradatum-dto must never depend on gradatum-engine.
License
Apache-2.0