π₯ Helios Engine - LLM Agent Framework
Helios Engine is a powerful and flexible Rust framework for building LLM-powered agents with tool support, streaming chat capabilities, and easy configuration management. Create intelligent agents that can interact with users, call tools, and maintain conversation context - with both online and offline local model support.
Features
- Agent System: Create multiple agents with different personalities and capabilities
- Tool Registry: Extensible tool system for adding custom functionality
- Chat Management: Built-in conversation history and session management
- Session Memory: Track agent state and metadata across conversations
- File Management Tools: Built-in tools for searching, reading, writing, and editing files
- Streaming Support: Real-time response streaming for both remote and local models
- Local Model Support: Run local models offline using llama.cpp with HuggingFace integration
- LLM Support: Compatible with OpenAI API, any OpenAI-compatible API, and local models
- Async/Await: Built on Tokio for high-performance async operations
- Type-Safe: Leverages Rust's type system for safe and reliable code
- Extensible: Easy to add custom tools and extend functionality
- Thinking Tags: Automatic detection and display of model reasoning process
- Dual Mode Support: Auto, online (remote API), and offline (local) modes
- Clean Output: Suppresses verbose debugging in offline mode for clean user experience
- CLI & Library: Use as both a command-line tool and a Rust library crate
Table of Contents
- Installation
- Quick Start
- CLI Usage
- Configuration
- Local Inference Setup
- Architecture
- Usage Examples
- Creating Custom Tools
- API Documentation
- Project Structure
- Examples
- Contributing
- License
Installation
Helios Engine can be used both as a command-line tool and as a library crate in your Rust projects.
As a CLI Tool (Recommended for Quick Start)
Install globally using Cargo (once published):
Then use anywhere:
# Initialize configuration
# Start interactive chat (default command)
# or explicitly
# Ask a quick question
# Get help
# NEW: Use offline mode with local models (no internet required)
# Use online mode (forces remote API usage)
# Auto mode (uses local if configured, otherwise remote)
# Verbose logging for debugging
# Custom system prompt
# One-off question with custom config
As a Library Crate
Add Helios-Engine to your Cargo.toml:
[]
= "0.2.5"
= { = "1.35", = ["full"] }
Or use a local path during development:
[]
= { = "../helios" }
= { = "1.35", = ["full"] }
Build from Source
# Install locally
Quick Start
Using as a Library Crate
The simplest way to use Helios Engine is to call LLM models directly:
use ;
use LLMConfig;
async
For detailed examples of using Helios Engine as a crate, see Using as a Crate Guide
Using Offline Mode with Local Models
Run models locally without internet connection:
use ;
use LocalConfig;
async
Note: First run downloads the model. Subsequent runs use the cached model.
Using with Agent System
For more advanced use cases with tools and persistent conversation:
1. Configure Your LLM
Create a config.toml file (supports both remote and local):
[]
= "gpt-3.5-turbo"
= "https://api.openai.com/v1"
= "your-api-key-here"
= 0.7
= 2048
# Optional: Add local configuration for offline mode
[]
= "unsloth/Qwen3-0.6B-GGUF"
= "Qwen3-0.6B-Q4_K_M.gguf"
= 0.7
= 2048
2. Create Your First Agent
use ;
async
3. Run the Interactive Demo
CLI Usage
Helios Engine provides a powerful command-line interface with multiple modes and options:
Interactive Chat Mode
Start an interactive chat session:
# Default chat session
# With custom system prompt
# With custom max iterations for tool calls
# With verbose logging for debugging
One-off Questions
Ask a single question without interactive mode:
# Ask a single question
# Ask with custom config file
Configuration Management
Initialize and manage configuration:
# Create a new configuration file
# Create config in custom location
Mode Selection
Choose between different operation modes:
# Auto mode (uses local if configured, otherwise remote API)
# Online mode (forces remote API usage)
# Offline mode (uses local models only)
Interactive Commands
During an interactive session, use these commands:
exitorquit- Exit the chat sessionclear- Clear conversation historyhistory- Show conversation historyhelp- Show help message
Configuration
Helios Engine uses TOML for configuration. You can configure either remote API access or local model inference with the dual LLMProviderType system.
Remote API Configuration (Default)
[]
# The model name (e.g., gpt-3.5-turbo, gpt-4, claude-3, etc.)
= "gpt-3.5-turbo"
# Base URL for the API (OpenAI or compatible)
= "https://api.openai.com/v1"
# Your API key
= "your-api-key-here"
# Temperature for response generation (0.0 - 2.0)
= 0.7
# Maximum tokens in response
= 2048
Local Model Configuration (Offline Mode with llama.cpp)
[]
# Remote config still needed for auto mode fallback
= "gpt-3.5-turbo"
= "https://api.openai.com/v1"
= "your-api-key-here"
= 0.7
= 2048
# Local model configuration for offline mode
[]
# HuggingFace repository and model file
= "unsloth/Qwen3-0.6B-GGUF"
= "Qwen3-0.6B-Q4_K_M.gguf"
# Local model settings
= 0.7
= 2048
Auto Mode Configuration (Remote + Local)
For maximum flexibility, configure both remote and local models to enable auto mode:
[]
= "gpt-3.5-turbo"
= "https://api.openai.com/v1"
= "your-api-key-here"
= 0.7
= 2048
# Local model as fallback
[]
= "unsloth/Qwen3-0.6B-GGUF"
= "Qwen3-0.6B-Q4_K_M.gguf"
= 0.7
= 2048
Supported LLM Providers
Helios Engine supports both remote APIs and local model inference:
Remote APIs (Online Mode)
Helios Engine works with any OpenAI-compatible API:
- OpenAI:
https://api.openai.com/v1 - Azure OpenAI:
https://your-resource.openai.azure.com/openai/deployments/your-deployment - Local Models (LM Studio):
http://localhost:1234/v1 - Ollama with OpenAI compatibility:
http://localhost:11434/v1 - Any OpenAI-compatible API
Local Models (Offline Mode)
Run models locally using llama.cpp without internet connection:
- GGUF Models: Compatible with all GGUF format models from HuggingFace
- Automatic Download: Models are downloaded automatically from HuggingFace
- GPU Acceleration: Uses GPU if available (via llama.cpp)
- Clean Output: Suppresses verbose debugging for clean user experience
- Popular Models: Works with Qwen, Llama, Mistral, and other GGUF models
Supported Model Sources:
- HuggingFace Hub repositories
- Local GGUF files
- Automatic model caching
Local Inference Setup
Helios Engine supports running large language models locally using llama.cpp through the LLMProviderType system, providing privacy, offline capability, and no API costs.
Prerequisites
- HuggingFace Account: Sign up at huggingface.co (free)
- HuggingFace CLI: Install the CLI tool:
Setting Up Local Models
-
Find a GGUF Model: Browse HuggingFace Models for compatible models
-
Update Configuration: Add local model config to your
config.toml:[] = "unsloth/Qwen3-0.6B-GGUF" = "Qwen3-0.6B-Q4_K_M.gguf" = 0.7 = 2048 -
Run in Offline Mode:
# First run downloads the model # Subsequent runs use cached model
Recommended Models
| Model | Size | Use Case | Repository |
|---|---|---|---|
| Qwen3-0.6B | ~400MB | Fast, good quality | unsloth/Qwen3-0.6B-GGUF |
| Llama-3.2-1B | ~700MB | Balanced performance | unsloth/Llama-3.2-1B-Instruct-GGUF |
| Mistral-7B | ~4GB | High quality | TheBloke/Mistral-7B-Instruct-v0.1-GGUF |
Performance & Features
- GPU Acceleration: Models automatically use GPU if available via llama.cpp's n_gpu_layers parameter
- Model Caching: Downloaded models are cached locally (~/.cache/huggingface)
- Memory Usage: Larger models need more RAM/VRAM
- First Run: Initial model download may take time depending on connection
- Clean Output Mode: Suppresses verbose debugging from llama.cpp for clean user experience
Streaming Support with Local Models
Local models now support real-time token-by-token streaming just like remote models! The LLMClient automatically handles streaming for both remote and local models through the same unified API, providing a consistent experience.
Architecture
System Overview
graph TB
User[User] -->|Input| Agent[Agent]
Agent -->|Messages| LLM[LLM Client]
Agent -->|Tool Calls| Registry[Tool Registry]
Registry -->|Execute| Tools[Tools]
Tools -->|Results| Agent
LLM -->|Response| Agent
Agent -->|Output| User
Config[Config TOML] -->|Load| Agent
style Agent fill:#4CAF50
style LLM fill:#2196F3
style Registry fill:#FF9800
style Tools fill:#9C27B0
Component Architecture
classDiagram
class Agent {
+name: String
+llm_client: LLMClient
+tool_registry: ToolRegistry
+chat_session: ChatSession
+chat(message) ChatMessage
+register_tool(tool) void
+clear_history() void
}
class LLMClient {
+provider: LLMProvider
+provider_type: LLMProviderType
+chat(messages, tools) ChatMessage
+chat_stream(messages, tools, callback) ChatMessage
+generate(request) LLMResponse
}
class ToolRegistry {
+tools: HashMap
+register(tool) void
+execute(name, args) ToolResult
+get_definitions() Vec
}
class Tool {
<<interface>>
+name() String
+description() String
+parameters() HashMap
+execute(args) ToolResult
}
class ChatSession {
+messages: Vec
+system_prompt: Option
+add_message(msg) void
+clear() void
}
class Config {
+llm: LLMConfig
+from_file(path) Config
+save(path) void
}
Agent --> LLMClient
Agent --> ToolRegistry
Agent --> ChatSession
Agent --> Config
ToolRegistry --> Tool
Tool <|-- CalculatorTool
Tool <|-- EchoTool
Tool <|-- CustomTool
Agent Execution Flow
sequenceDiagram
participant User
participant Agent
participant LLM
participant ToolRegistry
participant Tool
User->>Agent: Send Message
Agent->>Agent: Add to Chat History
loop Until No Tool Calls
Agent->>LLM: Send Messages + Tool Definitions
LLM->>Agent: Response (with/without tool calls)
alt Has Tool Calls
Agent->>ToolRegistry: Execute Tool
ToolRegistry->>Tool: Call with Arguments
Tool->>ToolRegistry: Return Result
ToolRegistry->>Agent: Tool Result
Agent->>Agent: Add Tool Result to History
else No Tool Calls
Agent->>User: Return Final Response
end
end
Tool Execution Pipeline
flowchart LR
A[User Request] --> B{LLM Decision}
B -->|Need Tool| C[Get Tool Definition]
C --> D[Parse Arguments]
D --> E[Execute Tool]
E --> F[Format Result]
F --> G[Add to Context]
G --> B
B -->|No Tool Needed| H[Return Response]
H --> I[User]
style B fill:#FFD700
style E fill:#4CAF50
style H fill:#2196F3
Usage Examples
Basic Chat
use ;
async
Agent with Built-in Tools
use ;
async
Multiple Agents
use ;
async
Agent with File Tools and Session Memory
Agents can use file management tools and track session state:
use ;
async
Streaming Chat (Direct LLM Usage)
Use streaming to receive responses in real-time:
use ;
use LLMConfig;
use Write;
async
Creating Custom Tools
Implement the Tool trait to create custom tools:
use async_trait;
use ;
use Value;
use HashMap;
;
// Use your custom tool
async
API Documentation
Core Types
Agent
The main agent struct that manages conversation and tool execution.
Methods:
builder(name)- Create a new agent builderchat(message)- Send a message and get a responseregister_tool(tool)- Add a tool to the agentclear_history()- Clear conversation historyset_system_prompt(prompt)- Set the system promptset_max_iterations(max)- Set maximum tool call iterationsset_memory(key, value)- Set a memory value for the agentget_memory(key)- Get a memory valueremove_memory(key)- Remove a memory valueclear_memory()- Clear all agent memory (preserves session metadata)get_session_summary()- Get a summary of the current sessionincrement_counter(key)- Increment a counter in memoryincrement_tasks_completed()- Increment the tasks_completed counter
Config
Configuration management for LLM settings.
Methods:
from_file(path)- Load config from TOML filedefault()- Create default configurationsave(path)- Save config to file
LLMClient
Client for interacting with LLM providers (remote or local).
Methods:
new(provider_type)- Create client with LLMProviderType (Remote or Local)chat(messages, tools)- Send messages and get responsechat_stream(messages, tools, callback)- Send messages and stream response with callback functiongenerate(request)- Low-level generation method
LLMProviderType
Enumeration for different LLM provider types.
Variants:
Remote(LLMConfig)- For remote API providers (OpenAI, Azure, etc.)Local(LocalConfig)- For local llama.cpp models
ToolRegistry
Manages and executes tools.
Methods:
new()- Create empty registryregister(tool)- Register a new toolexecute(name, args)- Execute a tool by nameget_definitions()- Get all tool definitionslist_tools()- List registered tool names
ChatSession
Manages conversation history and session metadata.
Methods:
new()- Create new sessionwith_system_prompt(prompt)- Set system promptadd_message(message)- Add message to historyadd_user_message(content)- Add a user messageadd_assistant_message(content)- Add an assistant messageget_messages()- Get all messagesclear()- Clear all messagesset_metadata(key, value)- Set session metadataget_metadata(key)- Get session metadataremove_metadata(key)- Remove session metadataget_summary()- Get a summary of the session
Built-in Tools
CalculatorTool
Performs basic arithmetic operations.
Parameters:
expression(string, required): Mathematical expression to evaluate
Example:
agent.tool;
EchoTool
Echoes back a message.
Parameters:
message(string, required): Message to echo
Example:
agent.tool;
FileSearchTool
Search for files by name pattern or search for content within files.
Parameters:
path(string, optional): Directory path to search in (default: current directory)pattern(string, optional): File name pattern with wildcards (e.g.,*.rs)content(string, optional): Text content to search for within filesmax_results(number, optional): Maximum number of results (default: 50)
Example:
agent.tool;
FileReadTool
Read the contents of a file with optional line range selection.
Parameters:
path(string, required): File path to readstart_line(number, optional): Starting line number (1-indexed)end_line(number, optional): Ending line number (1-indexed)
Example:
agent.tool;
FileWriteTool
Write content to a file (creates new or overwrites existing).
Parameters:
path(string, required): File path to write tocontent(string, required): Content to write
Example:
agent.tool;
FileEditTool
Edit a file by replacing specific text (find and replace).
Parameters:
path(string, required): File path to editfind(string, required): Text to findreplace(string, required): Replacement text
Example:
agent.tool;
MemoryDBTool
In-memory key-value database for caching data during conversations.
Parameters:
operation(string, required): Operation to perform:set,get,delete,list,clear,existskey(string, optional): Key for set, get, delete, exists operationsvalue(string, optional): Value for set operation
Supported Operations:
set- Store a key-value pairget- Retrieve a value by keydelete- Remove a key-value pairlist- List all stored itemsclear- Clear all dataexists- Check if a key exists
Example:
agent.tool;
Usage in conversation:
// Agent can now cache data
agent.chat.await?;
agent.chat.await?; // Agent retrieves from DB
QdrantRAGTool
RAG (Retrieval-Augmented Generation) tool with Qdrant vector database for semantic search and document retrieval.
Parameters:
operation(string, required): Operation:add_document,search,delete,cleartext(string, optional): Document text or search querydoc_id(string, optional): Document ID for delete operationlimit(number, optional): Number of search results (default: 5)metadata(object, optional): Additional metadata for documents
Supported Operations:
add_document- Embed and store a documentsearch- Semantic search with vector similaritydelete- Remove a document by IDclear- Clear all documents from collection
Example:
let rag_tool = new;
agent.tool;
Prerequisites:
- Qdrant running:
docker run -p 6333:6333 qdrant/qdrant - OpenAI API key for embeddings
Project Structure
helios/
βββ Cargo.toml # Project configuration
βββ README.md # This file
βββ config.example.toml # Example configuration
βββ .gitignore # Git ignore rules
β
βββ src/
β βββ lib.rs # Library entry point
β βββ main.rs # Binary entry point (interactive demo)
β βββ agent.rs # Agent implementation
β βββ llm.rs # LLM client and provider
β βββ tools.rs # Tool system and built-in tools
β βββ chat.rs # Chat message and session types
β βββ config.rs # Configuration management
β βββ error.rs # Error types
β
βββ docs/
β βββ API.md # API reference
β βββ QUICKSTART.md # Quick start guide
β βββ TUTORIAL.md # Detailed tutorial
β βββ USING_AS_CRATE.md # Using Helios as a library
β
βββ examples/
βββ basic_chat.rs # Simple chat example
βββ agent_with_tools.rs # Tool usage example
βββ agent_with_file_tools.rs # File management tools example
βββ agent_with_memory_db.rs # Memory database tool example
βββ custom_tool.rs # Custom tool implementation
βββ multiple_agents.rs # Multiple agents example
βββ direct_llm_usage.rs # Direct LLM client usage
βββ streaming_chat.rs # Streaming responses example
βββ local_streaming.rs # Local model streaming example
βββ complete_demo.rs # Complete feature demonstration
Module Overview
helios-engine/
β
βββ agent - Agent system and builder pattern
βββ llm - LLM client and API communication
βββ οΈ tools - Tool registry and implementations
βββ chat - Chat messages and session management
βββ config - TOML configuration loading/saving
βββ error - Error types and Result alias
Examples
Run the included examples:
# Basic chat example
# Agent with built-in tools (Calculator, Echo)
# Agent with file management tools
# Agent with in-memory database tool
# Custom tool implementation
# Multiple agents with different personalities
# Direct LLM usage without agents
# Streaming chat with remote models
# Local model streaming example
# Complete demo with all features
Testing
Run tests:
Run with logging:
RUST_LOG=debug
π Advanced Features
Custom LLM Providers
Implement the LLMProvider trait for custom backends:
use async_trait;
use ;
;
Tool Chaining
Agents automatically chain tool calls:
// The agent can use multiple tools in sequence
let response = agent.chat.await?;
Thinking Tags Display
Helios Engine automatically detects and displays thinking tags from LLM responses:
- The CLI displays thinking tags with visual indicators:
π [Thinking...] - Streaming responses show thinking tags in real-time
- Supports both
<thinking>and<think>tag formats - In offline mode, thinking tags are processed and removed from final output
Conversation Context
Maintain conversation history:
let mut agent = builder
.config
.system_prompt
.build
.await?;
let response1 = agent.chat.await?;
let response2 = agent.chat.await?; // Agent remembers: "Alice"
println!;
println!;
Clean Output Mode
In offline mode, Helios Engine suppresses all verbose debugging output from llama.cpp:
- No model loading messages
- No layer information display
- No verbose internal operations
- Clean, user-focused experience during local inference
Session Memory & Metadata
Track agent state and conversation metadata across interactions:
// Set agent memory (namespaced under "agent:" prefix)
agent.set_memory;
agent.set_memory;
// Get memory values
if let Some = agent.get_memory
// Increment counters
agent.increment_tasks_completed;
agent.increment_counter;
// Get session summary
println!;
// Clear only agent memory (preserves general session metadata)
agent.clear_memory;
Session metadata in ChatSession:
let mut session = new;
// Set general session metadata
session.set_metadata;
session.set_metadata;
// Retrieve metadata
if let Some = session.get_metadata
// Get session summary
println!;
File Management Tools
Built-in tools for file operations:
use ;
let mut agent = builder
.config
.tool // Search files by name or content
.tool // Read file contents
.tool // Write/create files
.tool // Find and replace in files
.build
.await?;
// Agent can now search, read, write, and edit files
let response = agent.chat.await?;
println!;
In-Memory Database Tool
Cache and retrieve data during agent conversations:
use ;
let mut agent = builder
.config
.system_prompt
.tool
.build
.await?;
// Store data
agent.chat.await?;
// Agent automatically uses the database to remember
agent.chat.await?;
// Response: "Your favorite color is blue"
// Cache expensive computations
agent.chat.await?;
agent.chat.await?;
// List all cached data
let response = agent.chat.await?;
println!;
Shared Database Between Agents:
use ;
use HashMap;
// Create a shared database
let shared_db = new;
// Multiple agents sharing the same database
let mut agent1 = builder
.config
.tool
.build
.await?;
let mut agent2 = builder
.config
.tool
.build
.await?;
// Data stored by agent1 is accessible to agent2
agent1.chat.await?;
agent2.chat.await?; // Gets "in_progress"
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
- Clone the repository:
- Build the project:
- Run tests:
- Format code:
- Check for issues:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with β€οΈ in Rust
β οΈ β οΈ HERE BE DRAGONS β οΈ β οΈ
π₯ ABANDON ALL HOPE, YE WHO ENTER HERE π₯
Greetings, Foolish Mortal
What lies before you is not codeβit is a CURSE.
A labyrinth of logic so twisted, so arcane, that it defies comprehension itself.
β‘ What Holds This Monstrosity Together
- π©Ή Duct tape (metaphorical and spiritual)
- π Prayers whispered at 3 AM
- π Stack Overflow answers from 2009
- π± Pure, unfiltered desperation
- π The tears of junior developers
- π² Luck (mostly luck)
π The Legend
Once, two beings understood this code:
β‘ God and Me β‘
Now... I have forgotten.
Only God remains.
And I'm not sure He's still watching.