inference-core
Foundation types for the rakka-inference workspace. Zero actor / GPU / HTTP dependencies — pure types that every other layer plugs into.
What it gives you
| Type | Purpose |
|---|---|
ModelRunner |
The trait every backend implements (local GPU + remote network). |
Deployment |
The shared declarative surface — what runs, where, with what limits. |
ExecuteBatch / RunHandle |
Request input + streaming output handle. |
TokenChunk / Tokens |
Per-chunk and aggregate output, with usage + cost. |
InferenceError |
Typed error surface (rate-limited, circuit-open, content-filtered, context-length, …). |
RuntimeKind / TransportKind / ProviderKind |
Backend identity used by placement, observability, dispatcher choice. |
RateLimits / RetryPolicy / Timeouts / CircuitBreakerConfig |
Per-deployment policy primitives. |
Secret<T> / SecretString |
Typed credentials — won't Debug, won't Display. |
infer_runtime(model) |
Default backend selection from a model name (gpt-* → OpenAI, etc). |
Why depend on this directly
If you're building a third-party runtime, this is the only crate
you need. Implement ModelRunner and you slot into the rest of the
workspace — gateway, supervision, distributed rate limiting, circuit
breaking, streaming — for free.
use ;
Dependency budget
inference-core depends only on:
serde/serde_json/thiserror/bytes— wire typessecrecy— typed secretsasync-trait— for theModelRunnertrait (documented exception)futures—BoxStreamforRunHandlechrono(no-default-features,clockonly) — forTokenstimestampsurl— for endpoint validation
No tokio, no rakka, no rakka-accel, no pyo3,
no reqwest. This is what makes cargo build -p inference --features remote-only produce a binary with zero GPU deps — the
foundation layer simply doesn't carry any.