gradatum-engine 0.3.6

Managed model runtime — axum OpenAI-compat server supervising a llama-server subprocess (PIVOT v2).
Documentation
# gradatum-engine

> Rust supervisor for llama-server inference processes — transparent OpenAI-compatible reverse proxy with restart-on-failure.

**Status**: Alpha (v0.3.x) — public, Apache-2.0. API not yet stable before v1.0.
Part of **[gradatum](https://crates.io/crates/gradatum)** — memory backbone for AI agents. · [github](https://github.com/gradatum/gradatum) · [gradatum.org](https://gradatum.org)

## Overview

`gradatum-engine` manages one or more `llama-server` child processes, acting as a supervisor
and transparent HTTP proxy. It does not load models itself — it spawns an external
`llama-server` binary and forwards requests to it, preserving the full OpenAI-compatible
interface including streaming, vision (mmproj), sampling parameters, and slot IDs.

Architecture in v0.3.x:

1. **Spawn** — launches `llama-server` via `tokio::process::Command` (never via shell).
2. **Wait-ready** — polls `GET /health` on the child process until it returns 200.
3. **Transparent reverse proxy** — forwards request bodies verbatim to the child; passes
   through SSE streams, `slot_id`, sampling fields, tool call parameters, and vision
   inputs without modification.
4. **Supervise** — bounded restart-on-failure with configurable retry limit; shuts down
   gracefully on SIGTERM.

Supports multi-model deployments (one engine instance per model, each on its own port).
Bind address is fail-closed: only binds to configured LAN addresses, never open to all
interfaces by default.

## Usage

```bash
gradatum-engine --config /etc/gradatum/engine-curator.toml
```

## Feature Flags

| Feature | Description |
|---|---|
| `serve` (default) | Compile the Axum HTTP server and llama-server supervisor |

## Anti-cycle invariant

`gradatum-engine` may depend on `gradatum-core` and `gradatum-dto`.
`gradatum-core` and `gradatum-dto` must never depend on `gradatum-engine`.

## License

Apache-2.0