orion-core
Agent harness for local LLM inference. Backend-agnostic — bring your own model runtime (llama.cpp, MLX, cloud APIs, anything).
orion-core handles the conversation loop so you don't have to: context management, token budgets, streaming events, chat formatting, and an automatic tool-execution loop (the agent parses tool calls, runs your tools, feeds the results back, and repeats until the model gives a final answer — see tools).
How It Works
)
)
)
) ()
)
)
You implement one trait (LlmBackend) for your inference engine. orion-core handles everything above it.
Quick Start
use Arc;
use ;
use mpsc;
// 1. Implement the backend trait for your engine (see `backend` below)
let backend: = new;
// 2. Create an agent
let mut agent = new;
// 3. You supply the event channel; the agent streams events into it
// while generation runs, then returns when the turn is done.
let = ;
// Consume events concurrently — forward them to your UI.
let consumer = spawn;
agent.prompt.await?;
consumer.await?;
A complete, runnable version lives in
examples/mock_backend.rs— try it withcargo run --example mock_backend.
Don't want to manage the channel yourself?
agent.prompt_stream(text, backend)creates it for you and hands back(receiver, future)— drive the future (e.g. withtokio::join!) while you drain the receiver.
For a real over-the-wire backend,
examples/openai_backend.rsimplementsLlmBackendagainst a streaming OpenAI-compatible/v1/completionsendpoint (OpenAI, llama.cppserver, vLLM, LM Studio, Ollama). Run it withcargo run --example openai_backend --features openai-example.
Modules
agent — The Orchestrator
The Agent struct is the main entry point. It owns the conversation state and drives the prompt → LLM → response loop.
use ;
let mut agent = new;
// Change settings on the fly
agent.set_system_prompt;
agent.set_inference_params;
// Conversation management
agent.clear; // Reset conversation
agent.replace_messages; // Restore a saved conversation
// Abort a running generation
agent.abort;
backend — Bring Your Own LLM
Implement the LlmBackend trait to plug in any inference engine:
use ;
use AtomicBool;
use Arc;
The backend runs on a blocking thread — no async required. orion-core handles the async orchestration.
messages — Conversation Data
Messages support five roles covering the full agent lifecycle:
use Message;
// Standard conversation
let sys = system;
let user = user;
let asst = assistant;
// Tool interaction
let result = tool_result;
Roles: System, User, Assistant, ToolCall, ToolResult
Every message has an id, timestamp, and optional token_count (populated after tokenization). Assistant messages can carry tool_calls; tool result messages carry a tool_result.
events — Real-Time UI Updates
The agent emits events as it processes. Subscribe to these for building reactive UIs.
Event sequence for a simple prompt:
Event sequence with tool calls:
All event types:
| Event | When | Key Data |
|---|---|---|
AgentStart |
Processing begins | — |
AgentEnd |
All done | All new messages |
TurnStart |
New LLM call begins | — |
TurnEnd |
LLM call + tools done | Assistant message, tool results |
MessageStart |
Any message added | Full message |
MessageDelta |
Each streamed token | delta, tokens_generated, tokens_per_sec |
MessageEnd |
Message complete | Full message |
ToolExecStart |
Tool begins running | Tool name, args |
ToolExecUpdate |
Tool streams progress | Partial output |
ToolExecEnd |
Tool finished | Result, is_error |
ContextBudget |
After context prep | Tokens used/max, messages included/pruned |
Warning |
Non-fatal issue | Warning text |
Error |
Fatal error | Error text |
context — Token Budget Management
Handles the hard problem of fitting a conversation into a fixed-size context window.
What it does:
- Prunes old messages when the conversation exceeds the token budget (sliding window — keeps system prompt + most recent messages)
- Formats the surviving messages into a ChatML prompt string
- Reports how many tokens are used and how many messages were pruned
use ChatMLTemplate;
use ;
let token_counter = ;
// Prune to fit the budget *and* format with a chat template in one step.
let prepared = prepare_context?;
// `prepared.prompt` is the formatted string to feed your backend.
println!;
The agent calls this automatically before each LLM call. You don't need to call it directly unless you want custom control.
Prune strategies (ContextConfig::prune_strategy):
SlidingWindow(default) — drop the oldest turns first to fit the budget.Summarize— before pruning, the agent folds the oldest overflowing turns into a single pinned summary message (one extra backend call), so their gist survives instead of being dropped. Prior summaries are consolidated, so exactly one accumulates. Best-effort: if summarization fails it falls back to the sliding window.
Pinned messages. Any Message with pinned == true always survives pruning,
regardless of budget or strategy. Build one with Message::user(id, text).pinned(),
or toggle an existing message via agent.set_pinned(message_id, true). Pruning is
turn-aware, so a pinned message keeps its whole turn (no orphaned pairs).
template — Chat Prompt Formats
Each model family wants its prompt wrapped a certain way. orion-core ships a
ChatTemplate for the common ones and picks the right one automatically.
Supported families: ChatML (default), Llama 3, Llama 2, Mistral / Mixtral, Gemma / Gemma 2, Phi-3, DeepSeek (LLM chat), Command-R / Command-R+, Alpaca, and Vicuna.
detect_template(gguf_template)— inspects a GGUF metadata template string and returns the matching impl (falling back to ChatML when nothing matches).template_from_name(name)— resolves a manual-override name (with common aliases, e.g.llama-2,phi-3,cohere) to a template, orNonefor an unimplemented family so the caller can fall back to auto-detection.Agent::with_template(config, template)/agent.set_template(template)— set or swap the template at runtime.
Every template also implements the per-message and per-system formatting hooks
the context pipeline needs for accurate token-budget accounting, and advertises
tools through the same tool_call convention (see below).
tools — Give the Model Abilities
Agent::prompt drives the full cycle automatically: it injects your tool
schemas into the system prompt, parses the model's tool calls out of its reply,
runs the matching tool, appends the result to the conversation, and loops back
to the model until it returns a tool-free answer (bounded by
AgentConfig::max_tool_iterations, default 8). Each step emits
ToolExecStart / ToolExecUpdate / ToolExecEnd events.
Tool-call convention. Templates advertise — and [parse_tool_calls] reads —
a fenced JSON block:
```tool_call
{"name": "read_file", "arguments": {"path": "Cargo.toml"}}
```
A JSON array invokes several tools in one turn. Parsing is lenient: a ```json
fence or a whole-message bare JSON object carrying both name and arguments
also count, so smaller models still trigger tools when they drift from the exact
format. Register tools with agent.set_tools(vec![Box::new(MyTool)]); with no
tools registered, parsing is skipped entirely and replies pass through verbatim.
Define tools the model can call. Each tool has a name, description, JSON Schema for parameters, and an async execute function.
use ;
use ToolUpdateCallback;
use async_trait;
;
// Register with the agent
agent.set_tools;
Tool schemas are automatically injected into the system prompt when formatting context, and Agent::prompt runs the full tool call → execute → feed result → LLM responds cycle for you (see the section intro above).
Opting out. The Tool trait and the execution loop live behind the default
tools feature, which pulls in async-trait.
Minimal consumers that only need plain chat can drop it:
= { = "0.2", = false }
Tool-call parsing (parse_tool_calls, ParsedToolCall) and ToolSchema stay
available either way — only the Tool trait, ToolOutput, ToolUpdateCallback,
and Agent::set_tools require the feature.
The code snippets in this README are mirrored by compile-checked doctests on the corresponding API items, so they can't silently drift from the real signatures. Run them with
cargo test --doc.
error — Error Types
use ;
// Error variants
Backend // LLM backend issues
Context // Context pipeline issues
Tool // Tool execution issues
Agent // Agent logic issues
Aborted // User cancelled
All errors are serializable (implements Serialize) for easy transport over IPC.
Architecture
)
)
)
)
)
)
)
)
)
)
Events flow upward through an unbounded channel (tokio::sync::mpsc). Your UI or application layer subscribes to AgentEvents and reacts in real time.
Stability
orion-core follows SemVer. While 0.x, a minor bump may
carry breaking changes and a patch bump is additive/fixes only. CoreError and
AgentEvent are #[non_exhaustive] — match them with a wildcard arm so new
variants don't break your build. The MSRV is Rust 1.85 (raised only in a
minor release), and that guarantee covers the default feature set; optional
example features like openai-example may need a newer toolchain. See
CONTRIBUTING.md for the full policy.
License
MIT © Kumar Anirudha. See LICENSE.