🔥 Helios Engine - LLM Agent Framework
Helios Engine is a powerful and flexible Rust framework for building LLM-powered agents with tool support, streaming chat capabilities, and easy configuration management. Create intelligent agents that can interact with users, call tools, and maintain conversation context - with both online and offline local model support.
Features
- Agent System: Create multiple agents with different personalities and capabilities
- Tool Registry: Extensible tool system for adding custom functionality
- Chat Management: Built-in conversation history and session management
- Streaming Support: Real-time response streaming with thinking tag detection
- Local Model Support: Run local models offline using llama.cpp with HuggingFace integration
- LLM Support: Compatible with OpenAI API, any OpenAI-compatible API, and local models
- Async/Await: Built on Tokio for high-performance async operations
- Type-Safe: Leverages Rust's type system for safe and reliable code
- Extensible: Easy to add custom tools and extend functionality
- Thinking Tags: Automatic detection and display of model reasoning process
- Dual Mode Support: Auto, online (remote API), and offline (local) modes
- Clean Output: Suppresses verbose debugging in offline mode for clean user experience
- CLI & Library: Use as both a command-line tool and a Rust library crate
Table of Contents
- Installation
- Quick Start
- CLI Usage
- Configuration
- Local Inference Setup
- Architecture
- Usage Examples
- Creating Custom Tools
- API Documentation
- Project Structure
- Examples
- Contributing
- License
Installation
Helios Engine can be used both as a command-line tool and as a library crate in your Rust projects.
As a CLI Tool (Recommended for Quick Start)
Install globally using Cargo (once published):
Then use anywhere:
# Initialize configuration
# Start interactive chat (default command)
# or explicitly
# Ask a quick question
# Get help
# NEW: Use offline mode with local models (no internet required)
# Use online mode (forces remote API usage)
# Auto mode (uses local if configured, otherwise remote)
# Verbose logging for debugging
# Custom system prompt
# One-off question with custom config
As a Library Crate
Add Helios-Engine to your Cargo.toml:
[]
= "0.1.9"
= { = "1.35", = ["full"] }
Or use a local path during development:
[]
= { = "../helios" }
= { = "1.35", = ["full"] }
Build from Source
# Install locally
Quick Start
Using as a Library Crate
The simplest way to use Helios Engine is to call LLM models directly:
use ;
use LLMConfig;
async
For detailed examples of using Helios Engine as a crate, see Using as a Crate Guide
Using Offline Mode with Local Models
Run models locally without internet connection:
use ;
use LocalConfig;
async
Note: First run downloads the model. Subsequent runs use the cached model.
Using with Agent System
For more advanced use cases with tools and persistent conversation:
1. Configure Your LLM
Create a config.toml file (supports both remote and local):
[]
= "gpt-3.5-turbo"
= "https://api.openai.com/v1"
= "your-api-key-here"
= 0.7
= 2048
# Optional: Add local configuration for offline mode
[]
= "unsloth/Qwen3-0.6B-GGUF"
= "Qwen3-0.6B-Q4_K_M.gguf"
= 0.7
= 2048
2. Create Your First Agent
use ;
async
3. Run the Interactive Demo
CLI Usage
Helios Engine provides a powerful command-line interface with multiple modes and options:
Interactive Chat Mode
Start an interactive chat session:
# Default chat session
# With custom system prompt
# With custom max iterations for tool calls
# With verbose logging for debugging
One-off Questions
Ask a single question without interactive mode:
# Ask a single question
# Ask with custom config file
Configuration Management
Initialize and manage configuration:
# Create a new configuration file
# Create config in custom location
Mode Selection
Choose between different operation modes:
# Auto mode (uses local if configured, otherwise remote API)
# Online mode (forces remote API usage)
# Offline mode (uses local models only)
Interactive Commands
During an interactive session, use these commands:
exitorquit- Exit the chat sessionclear- Clear conversation historyhistory- Show conversation historyhelp- Show help message
Configuration
Helios Engine uses TOML for configuration. You can configure either remote API access or local model inference with the dual LLMProviderType system.
Remote API Configuration (Default)
[]
# The model name (e.g., gpt-3.5-turbo, gpt-4, claude-3, etc.)
= "gpt-3.5-turbo"
# Base URL for the API (OpenAI or compatible)
= "https://api.openai.com/v1"
# Your API key
= "your-api-key-here"
# Temperature for response generation (0.0 - 2.0)
= 0.7
# Maximum tokens in response
= 2048
Local Model Configuration (Offline Mode with llama.cpp)
[]
# Remote config still needed for auto mode fallback
= "gpt-3.5-turbo"
= "https://api.openai.com/v1"
= "your-api-key-here"
= 0.7
= 2048
# Local model configuration for offline mode
[]
# HuggingFace repository and model file
= "unsloth/Qwen3-0.6B-GGUF"
= "Qwen3-0.6B-Q4_K_M.gguf"
# Local model settings
= 0.7
= 2048
Auto Mode Configuration (Remote + Local)
For maximum flexibility, configure both remote and local models to enable auto mode:
[]
= "gpt-3.5-turbo"
= "https://api.openai.com/v1"
= "your-api-key-here"
= 0.7
= 2048
# Local model as fallback
[]
= "unsloth/Qwen3-0.6B-GGUF"
= "Qwen3-0.6B-Q4_K_M.gguf"
= 0.7
= 2048
Supported LLM Providers
Helios Engine supports both remote APIs and local model inference:
Remote APIs (Online Mode)
Helios Engine works with any OpenAI-compatible API:
- OpenAI:
https://api.openai.com/v1 - Azure OpenAI:
https://your-resource.openai.azure.com/openai/deployments/your-deployment - Local Models (LM Studio):
http://localhost:1234/v1 - Ollama with OpenAI compatibility:
http://localhost:11434/v1 - Any OpenAI-compatible API
Local Models (Offline Mode)
Run models locally using llama.cpp without internet connection:
- GGUF Models: Compatible with all GGUF format models from HuggingFace
- Automatic Download: Models are downloaded automatically from HuggingFace
- GPU Acceleration: Uses GPU if available (via llama.cpp)
- Clean Output: Suppresses verbose debugging for clean user experience
- Popular Models: Works with Qwen, Llama, Mistral, and other GGUF models
Supported Model Sources:
- HuggingFace Hub repositories
- Local GGUF files
- Automatic model caching
Local Inference Setup
Helios Engine supports running large language models locally using llama.cpp through the LLMProviderType system, providing privacy, offline capability, and no API costs.
Prerequisites
- HuggingFace Account: Sign up at huggingface.co (free)
- HuggingFace CLI: Install the CLI tool:
Setting Up Local Models
-
Find a GGUF Model: Browse HuggingFace Models for compatible models
-
Update Configuration: Add local model config to your
config.toml:[] = "unsloth/Qwen3-0.6B-GGUF" = "Qwen3-0.6B-Q4_K_M.gguf" = 0.7 = 2048 -
Run in Offline Mode:
# First run downloads the model # Subsequent runs use cached model
Recommended Models
| Model | Size | Use Case | Repository |
|---|---|---|---|
| Qwen3-0.6B | ~400MB | Fast, good quality | unsloth/Qwen3-0.6B-GGUF |
| Llama-3.2-1B | ~700MB | Balanced performance | unsloth/Llama-3.2-1B-Instruct-GGUF |
| Mistral-7B | ~4GB | High quality | TheBloke/Mistral-7B-Instruct-v0.1-GGUF |
Performance & Features
- GPU Acceleration: Models automatically use GPU if available via llama.cpp's n_gpu_layers parameter
- Model Caching: Downloaded models are cached locally (~/.cache/huggingface)
- Memory Usage: Larger models need more RAM/VRAM
- First Run: Initial model download may take time depending on connection
- Clean Output Mode: Suppresses verbose debugging from llama.cpp for clean user experience
Streaming Support with Local Models
While streaming is available for remote models, local models currently provide full responses. The LLMClient automatically handles both streaming (remote) and non-streaming (local) modes consistently through the same API.
Architecture
System Overview
graph TB
User[User] -->|Input| Agent[Agent]
Agent -->|Messages| LLM[LLM Client]
Agent -->|Tool Calls| Registry[Tool Registry]
Registry -->|Execute| Tools[Tools]
Tools -->|Results| Agent
LLM -->|Response| Agent
Agent -->|Output| User
Config[Config TOML] -->|Load| Agent
style Agent fill:#4CAF50
style LLM fill:#2196F3
style Registry fill:#FF9800
style Tools fill:#9C27B0
Component Architecture
classDiagram
class Agent {
+name: String
+llm_client: LLMClient
+tool_registry: ToolRegistry
+chat_session: ChatSession
+chat(message) ChatMessage
+register_tool(tool) void
+clear_history() void
}
class LLMClient {
+provider: LLMProvider
+provider_type: LLMProviderType
+chat(messages, tools) ChatMessage
+chat_stream(messages, tools, callback) ChatMessage
+generate(request) LLMResponse
}
class ToolRegistry {
+tools: HashMap
+register(tool) void
+execute(name, args) ToolResult
+get_definitions() Vec
}
class Tool {
<<interface>>
+name() String
+description() String
+parameters() HashMap
+execute(args) ToolResult
}
class ChatSession {
+messages: Vec
+system_prompt: Option
+add_message(msg) void
+clear() void
}
class Config {
+llm: LLMConfig
+from_file(path) Config
+save(path) void
}
Agent --> LLMClient
Agent --> ToolRegistry
Agent --> ChatSession
Agent --> Config
ToolRegistry --> Tool
Tool <|-- CalculatorTool
Tool <|-- EchoTool
Tool <|-- CustomTool
Agent Execution Flow
sequenceDiagram
participant User
participant Agent
participant LLM
participant ToolRegistry
participant Tool
User->>Agent: Send Message
Agent->>Agent: Add to Chat History
loop Until No Tool Calls
Agent->>LLM: Send Messages + Tool Definitions
LLM->>Agent: Response (with/without tool calls)
alt Has Tool Calls
Agent->>ToolRegistry: Execute Tool
ToolRegistry->>Tool: Call with Arguments
Tool->>ToolRegistry: Return Result
ToolRegistry->>Agent: Tool Result
Agent->>Agent: Add Tool Result to History
else No Tool Calls
Agent->>User: Return Final Response
end
end
Tool Execution Pipeline
flowchart LR
A[User Request] --> B{LLM Decision}
B -->|Need Tool| C[Get Tool Definition]
C --> D[Parse Arguments]
D --> E[Execute Tool]
E --> F[Format Result]
F --> G[Add to Context]
G --> B
B -->|No Tool Needed| H[Return Response]
H --> I[User]
style B fill:#FFD700
style E fill:#4CAF50
style H fill:#2196F3
Usage Examples
Basic Chat
use ;
async
Agent with Built-in Tools
use ;
async
Multiple Agents
use ;
async
Streaming Chat (Direct LLM Usage)
Use streaming to receive responses in real-time:
use ;
use LLMConfig;
async
Creating Custom Tools
Implement the Tool trait to create custom tools:
use async_trait;
use ;
use Value;
use HashMap;
;
// Use your custom tool
async
API Documentation
Core Types
Agent
The main agent struct that manages conversation and tool execution.
Methods:
builder(name)- Create a new agent builderchat(message)- Send a message and get a responseregister_tool(tool)- Add a tool to the agentclear_history()- Clear conversation historyset_system_prompt(prompt)- Set the system promptset_max_iterations(max)- Set maximum tool call iterations
Config
Configuration management for LLM settings.
Methods:
from_file(path)- Load config from TOML filedefault()- Create default configurationsave(path)- Save config to file
LLMClient
Client for interacting with LLM providers (remote or local).
Methods:
new(provider_type)- Create client with LLMProviderType (Remote or Local)chat(messages, tools)- Send messages and get responsechat_stream(messages, tools, callback)- Send messages and stream response with callback functiongenerate(request)- Low-level generation method
LLMProviderType
Enumeration for different LLM provider types.
Variants:
Remote(LLMConfig)- For remote API providers (OpenAI, Azure, etc.)Local(LocalConfig)- For local llama.cpp models
ToolRegistry
Manages and executes tools.
Methods:
new()- Create empty registryregister(tool)- Register a new toolexecute(name, args)- Execute a tool by nameget_definitions()- Get all tool definitionslist_tools()- List registered tool names
ChatSession
Manages conversation history.
Methods:
new()- Create new sessionwith_system_prompt(prompt)- Set system promptadd_message(message)- Add message to historyclear()- Clear all messages
Built-in Tools
CalculatorTool
Performs basic arithmetic operations.
Parameters:
expression(string, required): Mathematical expression
Example:
agent.tool;
EchoTool
Echoes back a message.
Parameters:
message(string, required): Message to echo
Example:
agent.tool;
Project Structure
helios/
├── Cargo.toml # Project configuration
├── README.md # This file
├── config.example.toml # Example configuration
├── .gitignore # Git ignore rules
│
├── src/
│ ├── lib.rs # Library entry point
│ ├── main.rs # Binary entry point (interactive demo)
│ ├── agent.rs # Agent implementation
│ ├── llm.rs # LLM client and provider
│ ├── tools.rs # Tool system and built-in tools
│ ├── chat.rs # Chat message and session types
│ ├── config.rs # Configuration management
│ └── error.rs # Error types
│
├── docs/
│ ├── API.md # API reference
│ ├── QUICKSTART.md # Quick start guide
│ ├── TUTORIAL.md # Detailed tutorial
│ └── USING_AS_CRATE.md # Using Helios as a library
│
└── examples/
├── basic_chat.rs # Simple chat example
├── agent_with_tools.rs # Tool usage example
├── custom_tool.rs # Custom tool implementation
├── multiple_agents.rs # Multiple agents example
└── direct_llm_usage.rs # Direct LLM client usage
Module Overview
helios-engine/
│
├── agent - Agent system and builder pattern
├── llm - LLM client and API communication
├── ️ tools - Tool registry and implementations
├── chat - Chat messages and session management
├── config - TOML configuration loading/saving
└── error - Error types and Result alias
Examples
Run the included examples:
# Basic chat
# Agent with tools
# Custom tool
# Multiple agents
Testing
Run tests:
Run with logging:
RUST_LOG=debug
🔍 Advanced Features
Custom LLM Providers
Implement the LLMProvider trait for custom backends:
use async_trait;
use ;
;
Tool Chaining
Agents automatically chain tool calls:
// The agent can use multiple tools in sequence
let response = agent.chat.await?;
Thinking Tags Display
Helios Engine automatically detects and displays thinking tags from LLM responses:
- The CLI displays thinking tags with visual indicators:
💭 [Thinking...] - Streaming responses show thinking tags in real-time
- Supports both
<thinking>and<think>tag formats - In offline mode, thinking tags are processed and removed from final output
Conversation Context
Maintain conversation history:
let mut agent = new;
agent.chat.await?;
agent.chat.await?; // Agent remembers: "Alice"
Clean Output Mode
In offline mode, Helios Engine suppresses all verbose debugging output from llama.cpp:
- No model loading messages
- No layer information display
- No verbose internal operations
- Clean, user-focused experience during local inference
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
- Clone the repository:
- Build the project:
- Run tests:
- Format code:
- Check for issues:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ in Rust