Expand description
orion-core: Agent harness for local LLM inference.
Provides the agent loop, context pipeline, tool execution, and event system for building AI chat interfaces on top of local model backends (llama.cpp, MLX, etc.).
§Architecture
User prompt
→ Agent.prompt()
→ Context pipeline (prune pairs + template format)
→ LlmBackend.generate() (streaming tokens)
→ Tool execution loop (parse calls → run tools → feed results back)
→ AgentEvent stream → UIThe crate is backend-agnostic. Implement backend::LlmBackend
for your inference engine and the agent handles the rest.
§Example
Implement LlmBackend for your engine, then drive the agent. The mock
backend below streams a canned reply so the whole loop runs end to end (a
complete version lives in examples/mock_backend.rs):
use std::sync::Arc;
use std::sync::atomic::AtomicBool;
use orion_core::{
Agent, AgentConfig, AgentEvent, CoreResult, GenerationResult,
InferenceParams, LlmBackend, TokenCallback,
};
use tokio::sync::mpsc;
struct MockBackend;
impl LlmBackend for MockBackend {
fn generate(
&self,
_prompt: &str,
_params: &InferenceParams,
_abort: Arc<AtomicBool>,
mut on_token: TokenCallback,
) -> CoreResult<GenerationResult> {
on_token("Hi!", 1, 10.0);
Ok(GenerationResult {
text: "Hi!".into(),
tokens_generated: 1,
prompt_tokens: 0,
tokens_per_sec: 10.0,
time_to_first_token_ms: 1.0,
generation_time_ms: 1.0,
})
}
fn tokenize_count(&self, text: &str) -> CoreResult<u32> {
Ok(text.split_whitespace().count() as u32)
}
fn is_ready(&self) -> bool { true }
}
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(async {
let mut agent = Agent::new(AgentConfig::default());
let backend: Arc<dyn LlmBackend> = Arc::new(MockBackend);
let (tx, mut rx) = mpsc::unbounded_channel::<AgentEvent>();
// Consume events concurrently while the agent generates.
let consumer = tokio::spawn(async move {
let mut reply = String::new();
while let Some(event) = rx.recv().await {
if let AgentEvent::MessageDelta { delta, .. } = event {
reply.push_str(&delta);
}
}
reply
});
agent.prompt("Hello", backend, tx).await.unwrap();
assert_eq!(consumer.await.unwrap(), "Hi!");
});Re-exports§
pub use agent::Agent;pub use agent::AgentConfig;pub use backend::GenerationResult;pub use backend::InferenceParams;pub use backend::LlmBackend;pub use backend::TokenCallback;pub use context::plan_prune;pub use context::ContextConfig;pub use context::PreparedContext;pub use context::PrunePlan;pub use context::PruneStrategy;pub use error::CoreError;pub use error::CoreResult;pub use events::AgentEvent;pub use messages::Message;pub use messages::Role;pub use messages::ToolCall;pub use messages::ToolResult;pub use template::detect_template;pub use template::template_from_name;pub use template::AlpacaTemplate;pub use template::ChatMLTemplate;pub use template::ChatTemplate;pub use template::CommandRTemplate;pub use template::DeepSeekTemplate;pub use template::GemmaTemplate;pub use template::Llama2Template;pub use template::Llama3Template;pub use template::MistralTemplate;pub use template::Phi3Template;pub use template::VicunaTemplate;pub use tools::parse_tool_calls;pub use tools::ParsedToolCall;pub use tools::ToolSchema;pub use tools::Tool;pub use tools::ToolOutput;pub use tools::ToolUpdateCallback;
Modules§
- agent
- The
Agentorchestrator and its configuration. - backend
- The
LlmBackendtrait and inference parameter/result types. - context
- Context-window management: pruning, token budgeting, and prompt formatting.
- error
- Error and result types for the crate.
- events
- The
AgentEventstream emitted while the agent runs. - messages
- Conversation data types:
Message,Role, and tool call/result records. - template
- Chat prompt templates for the supported model families.
- tools
- The
Tooltrait (featuretools), tool schemas, and tool-call parsing.