gradatum-engine

Rust supervisor for llama-server inference processes — transparent OpenAI-compatible reverse proxy with restart-on-failure.

Status: Alpha (v0.3.x) — public, Apache-2.0. API not yet stable before v1.0. Part of gradatum — memory backbone for AI agents. · github · gradatum.org

Overview

gradatum-engine manages one or more llama-server child processes, acting as a supervisor and transparent HTTP proxy. It does not load models itself — it spawns an external llama-server binary and forwards requests to it, preserving the full OpenAI-compatible interface including streaming, vision (mmproj), sampling parameters, and slot IDs.

Architecture in v0.3.x:

Spawn — launches llama-server via tokio::process::Command (never via shell).
Wait-ready — polls GET /health on the child process until it returns 200.
Transparent reverse proxy — forwards request bodies verbatim to the child; passes through SSE streams, slot_id, sampling fields, tool call parameters, and vision inputs without modification.
Supervise — bounded restart-on-failure with configurable retry limit; shuts down gracefully on SIGTERM.

Supports multi-model deployments (one engine instance per model, each on its own port). Bind address is fail-closed: only binds to configured LAN addresses, never open to all interfaces by default.

Usage

gradatum-engine --config /etc/gradatum/engine-curator.toml

Feature Flags

Feature	Description
`serve` (default)	Compile the Axum HTTP server and llama-server supervisor

Anti-cycle invariant

gradatum-engine may depend on gradatum-core and gradatum-dto. gradatum-core and gradatum-dto must never depend on gradatum-engine.

License