helios-engine 0.5.5

A powerful and flexible Rust framework for building LLM-powered agents with tool support, both locally and online
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
# Configuration Guide

Helios Engine supports flexible configuration for both remote API access and local model inference through the dual LLMProviderType system. This guide covers all configuration options and setup scenarios.

## Quick Start Configuration

### Basic Remote API Setup

Create a `config.toml` file in your project root:

```toml
[llm]
model_name = "gpt-3.5-turbo"
base_url = "https://api.openai.com/v1"
api_key = "sk-your-api-key-here"
temperature = 0.7
max_tokens = 2048
```

### Programmatic Configuration

Create configuration in code:

```rust
use helios_engine::config::LLMConfig;

let config = LLMConfig {
    model_name: "gpt-3.5-turbo".to_string(),
    base_url: "https://api.openai.com/v1".to_string(),
    api_key: std::env::var("OPENAI_API_KEY").unwrap(),
    temperature: 0.7,
    max_tokens: 2048,
};
```

## Supported LLM Providers

### Remote APIs (Online Mode)

Helios Engine works with any OpenAI-compatible API:

#### OpenAI
```toml
[llm]
model_name = "gpt-4"  # or gpt-3.5-turbo, gpt-4-turbo, etc.
base_url = "https://api.openai.com/v1"
api_key = "sk-your-openai-api-key"
temperature = 0.7
max_tokens = 2048
```

#### Azure OpenAI
```toml
[llm]
model_name = "gpt-35-turbo"  # Azure deployment name
base_url = "https://your-resource.openai.azure.com/openai/deployments/your-deployment"
api_key = "your-azure-openai-key"
temperature = 0.7
max_tokens = 2048
```

#### Local Models via API (LM Studio, Ollama, etc.)
```toml
[llm]
model_name = "local-model"
base_url = "http://localhost:1234/v1"  # LM Studio default
api_key = "not-needed"  # or your API key if required
temperature = 0.7
max_tokens = 2048
```

#### Ollama
```toml
[llm]
model_name = "llama2"  # or any model you have pulled
base_url = "http://localhost:11434/v1"
api_key = "not-needed"
temperature = 0.7
max_tokens = 2048
```

### Local Models (Offline Mode)

Run models locally using llama.cpp without internet connection:

#### Prerequisites

1. **HuggingFace Account**: Sign up at [huggingface.co]https://huggingface.co (free)
2. **HuggingFace CLI**: Install and login:
   ```bash
   pip install huggingface_hub
   huggingface-cli login  # Login with your token
   ```

#### Local Model Configuration

```toml
[llm]
# Remote config still needed for auto mode fallback
model_name = "gpt-3.5-turbo"
base_url = "https://api.openai.com/v1"
api_key = "your-api-key-here"
temperature = 0.7
max_tokens = 2048

# Local model configuration for offline mode
[local]
huggingface_repo = "unsloth/Qwen3-0.6B-GGUF"
model_file = "Qwen3-0.6B-Q4_K_M.gguf"
temperature = 0.7
max_tokens = 2048
context_size = 8192  # Optional, defaults to 4096
```

### Auto Mode Configuration (Remote + Local)

For maximum flexibility, configure both remote and local models to enable auto mode:

```toml
[llm]
model_name = "gpt-3.5-turbo"
base_url = "https://api.openai.com/v1"
api_key = "your-api-key-here"
temperature = 0.7
max_tokens = 2048

# Local model as fallback
[local]
huggingface_repo = "unsloth/Qwen3-0.6B-GGUF"
model_file = "Qwen3-0.6B-Q4_K_M.gguf"
temperature = 0.7
max_tokens = 2048
```

## Local Inference Setup

### Setting Up Local Models

1. **Find a GGUF Model**: Browse [HuggingFace Models]https://huggingface.co/models?library=gguf for compatible models

2. **Update Configuration**: Add local model config to your `config.toml`:
   ```toml
   [local]
   huggingface_repo = "unsloth/Qwen3-0.6B-GGUF"
   model_file = "Qwen3-0.6B-Q4_K_M.gguf"
   temperature = 0.7
   max_tokens = 2048
   context_size = 8192
   ```

3. **Run in Offline Mode**:
   ```bash
   # First run downloads the model (~400MB for Qwen3-0.6B)
   helios-engine --mode offline ask "Hello world"

   # Subsequent runs use cached model
   helios-engine --mode offline chat
   ```

### Recommended Models

| Model | Size | Use Case | Repository |
|-------|------|----------|------------|
| Qwen3-0.6B | ~400MB | Fast, good quality | `unsloth/Qwen3-0.6B-GGUF` |
| Llama-3.2-1B | ~700MB | Balanced performance | `unsloth/Llama-3.2-1B-Instruct-GGUF` |
| Mistral-7B | ~4GB | High quality | `TheBloke/Mistral-7B-Instruct-v0.1-GGUF` |
| Llama-3-8B | ~5GB | Excellent quality | `unsloth/Meta-Llama-3-8B-Instruct-GGUF` |

### Performance & Features

- **GPU Acceleration**: Models automatically use GPU if available (via llama.cpp's n_gpu_layers parameter)
- **Model Caching**: Downloaded models are cached locally (~/.cache/huggingface)
- **Memory Usage**: Larger models need more RAM/VRAM
- **First Run**: Initial model download may take time depending on connection speed
- **Clean Output Mode**: Suppresses verbose debugging from llama.cpp for clean user experience

### Local Model Parameters

```toml
[local]
# HuggingFace repository and model file (required)
huggingface_repo = "unsloth/Qwen3-0.6B-GGUF"
model_file = "Qwen3-0.6B-Q4_K_M.gguf"

# Generation parameters (optional, defaults provided)
temperature = 0.7        # 0.0-2.0, controls randomness
max_tokens = 2048        # Maximum tokens to generate
context_size = 8192      # Context window size

# Advanced parameters (optional)
top_k = 40              # Top-k sampling
top_p = 0.9             # Nucleus sampling
repeat_penalty = 1.1    # Repetition penalty

# Hardware acceleration (optional)
n_gpu_layers = -1       # -1 = use all available GPU layers, 0 = CPU only
n_threads = -1          # -1 = use all available CPU threads
```

## Operation Modes

### Auto Mode
Uses local model if available and configured, otherwise falls back to remote API:
```bash
helios-engine --mode auto chat
```

### Online Mode
Forces remote API usage, ignores local configuration:
```bash
helios-engine --mode online chat
```

### Offline Mode
Uses only local models, fails if not configured:
```bash
helios-engine --mode offline chat
```

## Environment Variables

Use environment variables for sensitive configuration:

```bash
export OPENAI_API_KEY="sk-your-key-here"
export LLM_BASE_URL="https://api.openai.com/v1"
export LLM_MODEL="gpt-4"
```

```rust
use helios_engine::config::LLMConfig;

let config = LLMConfig {
    model_name: std::env::var("LLM_MODEL")
        .unwrap_or_else(|_| "gpt-3.5-turbo".to_string()),
    base_url: std::env::var("LLM_BASE_URL")
        .unwrap_or_else(|_| "https://api.openai.com/v1".to_string()),
    api_key: std::env::var("OPENAI_API_KEY")
        .expect("OPENAI_API_KEY must be set"),
    temperature: 0.7,
    max_tokens: 2048,
};
```

## Advanced Configuration

### Custom HTTP Client

For production deployments with connection pooling:

```rust
use helios_engine::config::LLMConfig;
use reqwest::Client;

let http_client = Client::builder()
    .pool_max_idle_per_host(10)
    .pool_idle_timeout(std::time::Duration::from_secs(30))
    .tcp_keepalive(std::time::Duration::from_secs(60))
    .build()
    .await?;

let config = LLMConfig {
    model_name: "gpt-4".to_string(),
    base_url: "https://api.openai.com/v1".to_string(),
    api_key: std::env::var("OPENAI_API_KEY").unwrap(),
    temperature: 0.7,
    max_tokens: 2048,
    client: Some(http_client),
};
```

### Multiple Configurations

Manage different configurations for different use cases:

```rust
use helios_engine::{Config, Agent};
use std::collections::HashMap;

// Load multiple configs
let prod_config = Config::from_file("config.prod.toml")?;
let dev_config = Config::from_file("config.dev.toml")?;
let local_config = Config::from_file("config.local.toml")?;

// Create agents with different configs
let mut prod_agent = Agent::builder("ProductionAgent")
    .config(prod_config)
    .build()
    .await?;

let mut dev_agent = Agent::builder("DevelopmentAgent")
    .config(dev_config)
    .build()
    .await?;
```

### Configuration Validation

Validate configuration before use:

```rust
use helios_engine::Config;

let config = Config::from_file("config.toml")?;

// Validate LLM configuration
config.validate_llm_config()?;

// Validate local model configuration (if present)
if let Some(local_config) = &config.local {
    local_config.validate()?;
}

// Configuration is ready to use
let mut agent = Agent::builder("ValidatedAgent")
    .config(config)
    .build()
    .await?;
```

## Configuration Files Organization

### Project Structure

```
my-project/
├── config.toml          # Main configuration
├── config.prod.toml     # Production settings
├── config.dev.toml      # Development settings
├── config.local.toml    # Local model settings
└── src/
    └── main.rs
```

### Environment-Specific Configs

**Development (config.dev.toml):**
```toml
[llm]
model_name = "gpt-3.5-turbo"
base_url = "https://api.openai.com/v1"
api_key = "sk-dev-key"
temperature = 0.9  # More creative for development
max_tokens = 1024  # Shorter for faster iteration
```

**Production (config.prod.toml):**
```toml
[llm]
model_name = "gpt-4"
base_url = "https://api.openai.com/v1"
api_key = "sk-prod-key"
temperature = 0.3  # More deterministic for production
max_tokens = 2048
```

## Troubleshooting Configuration

### Common Issues

**"API key not found"**
- Ensure your API key is set in environment variables or config file
- Check that the environment variable name matches exactly
- Verify the API key is valid and has proper permissions

**"Model not found"**
- Check that the model name is correct for your provider
- Ensure the model is available in your OpenAI/Azure account
- For local models, verify the HuggingFace repository and file exist

**"Connection failed"**
- Verify the base_url is correct and accessible
- Check firewall/proxy settings
- Ensure the API endpoint is responding

**"Local model download failed"**
- Verify HuggingFace CLI is installed and logged in
- Check available disk space (models can be several GB)
- Ensure stable internet connection for initial download

**"GPU not detected"**
- Install CUDA/cuBLAS for GPU acceleration
- Check that llama.cpp was compiled with GPU support
- Set `n_gpu_layers = 0` to force CPU-only mode

### Configuration Validation

Create a validation script to check your configuration:

```rust
use helios_engine::Config;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    println!("Validating Helios Engine configuration...");

    // Test config loading
    let config = Config::from_file("config.toml")?;
    println!(" Configuration file loaded successfully");

    // Test LLM client creation
    let _client = helios_engine::LLMClient::new(config.llm_provider()).await?;
    println!(" LLM client created successfully");

    // Test local model (if configured)
    if let Some(local_config) = &config.local {
        println!("Testing local model configuration...");
        // Note: Actual model loading would happen on first use
        println!("Local model configuration is valid");
    }

    println!("All configuration tests passed!");
    Ok(())
}
```

## Next Steps

- **[Installation Guide]INSTALLATION.md** - How to install Helios Engine
- **[Usage Guide]USAGE.md** - Common usage patterns
- **[Tools Guide]TOOLS.md** - Available tools and custom tool creation
- **[Examples]../examples/** - Working configuration examples