helios-engine 0.5.5

A powerful and flexible Rust framework for building LLM-powered agents with tool support, both locally and online
Documentation
# ✅ Setup Complete: Qwen2.5-0.5B-Instruct with Candle

## What Was Done

### 1. ✅ Code Changes
- **Modified `src/candle_provider.rs`** to add automatic HuggingFace cache loading
  - Added `find_model_in_cache()` function to search local cache
  - Modified `download_model_and_tokenizer()` to check cache first before downloading
  - Supports `HF_HOME` environment variable for custom cache locations
  - Automatically detects cache structure: `~/.cache/huggingface/hub/models--{repo}/snapshots/`

### 2. ✅ Configuration Updated
- **Updated `config.example.toml`** with Qwen2.5-0.5B-Instruct configuration
  - Added documentation about cache loading
  - Set optimal defaults for the 0.5B model
  - Included alternative model examples

### 3. ✅ Documentation Created
- **`CANDLE_LOCAL_SETUP.md`** - Comprehensive setup guide with:
  - Prerequisites and installation steps
  - Model download instructions
  - Configuration reference
  - Code examples
  - Troubleshooting guide
  - Performance tips

- **`QUICK_START_QWEN2.5.md`** - Quick reference with:
  - 3-step TL;DR setup
  - All essential commands
  - Configuration table
  - Alternative models
  - Performance tips

---

## Your Model Location

Your Qwen2.5-0.5B-Instruct model is already cached at:

```
~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/
└── snapshots/
    └── 7ae557604adf67be50417f59c2c2f167def9a775/
        ├── model.safetensors           (Main model file)
        ├── tokenizer.json              (Tokenizer)
        ├── config.json                 (Model config)
        ├── generation_config.json
        ├── tokenizer_config.json
        └── ... (other config files)
```

---

## Quick Start Commands

### Step 1: Verify Model Cache
```bash
ls ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/*/model.safetensors
```

### Step 2: Create config.toml
```bash
cp config.example.toml config.toml
```

### Step 3: Build
```bash
cargo build --features candle --release
```

### Step 4: Run
```bash
cargo run --features candle --release
```

---

## How Cache Loading Works

The Candle provider now automatically:

1. **Checks local cache first**
   - Looks in `~/.cache/huggingface/hub/`
   - Converts repo name to cache format: `Qwen/Qwen2.5-0.5B-Instruct``models--Qwen--Qwen2.5-0.5B-Instruct`
   - Searches in `snapshots/` directory

2. **Loads model and tokenizer**
   - Finds `model.safetensors` (the actual model)
   - Finds `tokenizer.json` (for text tokenization)
   - Both from the same snapshot directory

3. **Falls back to download** (if not cached)
   - Only if model not found locally
   - Downloads from HuggingFace Hub
   - Stores in the standard cache location

**Result**: ⚡ No configuration needed! Just works automatically.

---

## Configuration (config.toml)

```toml
[candle]
# Repository on HuggingFace Hub
huggingface_repo = "Qwen/Qwen2.5-0.5B-Instruct"

# Model file (always model.safetensors for transformer models)
model_file = "model.safetensors"

# Context window - how many tokens the model can process
# Qwen2.5-0.5B supports up to 32,768 tokens
context_size = 32768

# Randomness in generation (0.0 = deterministic, 1.0 = random)
temperature = 0.7

# Maximum tokens to generate per request
max_tokens = 2048

# Use GPU if available (CUDA, Metal, etc.)
use_gpu = true
```

---

## Key Features

### 🚀 Automatic Cache Loading
- No manual cache configuration needed
- Respects `HF_HOME` environment variable
- Supports standard HuggingFace cache layout

### 📦 Offline Support
- Model loads instantly from cache after first download
- No internet needed after initial download

### ⚡ Performance
- GPU support with `use_gpu = true`
- 0.5B model is very fast (ideal for local development)
- Lightweight (~400-600 MB)

### 🔄 Multiple Model Support
- Same setup works for Llama, Gemma, Mistral, etc.
- Just change `huggingface_repo` in config

---

## Running Examples

```bash
# Basic chat interaction
cargo run --example basic_chat --features candle --release

# Direct LLM usage
cargo run --example direct_llm_usage --features candle --release

# Agent with tools
cargo run --example agent_with_tools --features candle --release

# Complete demo
cargo run --example complete_demo --features candle --release

# Forest of agents (multi-agent system)
cargo run --example forest_of_agents --features candle --release
```

---

## Environment Variables

### HF_HOME (Optional)
By default, HuggingFace cache is in `~/.cache/huggingface/`

To use a custom location:
```bash
export HF_HOME=/path/to/custom/cache
cargo run --features candle --release
```

---

## Alternative Models

The same cache loading mechanism works with other models. Just update `config.toml`:

```toml
[candle]
# Qwen2 (7B, better quality but larger)
huggingface_repo = "Qwen/Qwen2-7B-Instruct"

# Or Llama2
huggingface_repo = "meta-llama/Llama-2-7b-chat-hf"

# Or Gemma
huggingface_repo = "google/gemma-7b-it"

# Or Mistral
huggingface_repo = "mistralai/Mistral-7B-Instruct-v0.1"
```

Then download and run:
```bash
huggingface-cli download {your-model}
cargo run --features candle --release
```

---

## Troubleshooting

### Model not found in cache?
```bash
# Download the model
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct

# Verify it's cached
ls ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/*/model.safetensors
```

### Out of memory?
Reduce in `config.toml`:
```toml
context_size = 16384      # Instead of 32768
max_tokens = 1024         # Instead of 2048
```

### Slow inference?
Make sure GPU is enabled in `config.toml`:
```toml
use_gpu = true
```

### Tokenizer not found?
Ensure tokenizer.json exists:
```bash
ls ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/*/tokenizer.json
```

If missing, re-download:
```bash
rm -rf ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct
```

---

## Files Modified/Created

### Modified
- `src/candle_provider.rs` - Added cache lookup function
-`config.example.toml` - Updated with Qwen2.5 config

### Created (Documentation)
- `CANDLE_LOCAL_SETUP.md` - Detailed setup guide
-`QUICK_START_QWEN2.5.md` - Quick reference
-`SETUP_COMPLETE.md` - This file

---

## Next Steps

1. **Copy config template** (optional, you can edit config.example.toml directly)
   ```bash
   cp config.example.toml config.toml
   ```

2. **Build the project**
   ```bash
   cargo build --features candle --release
   ```

3. **Run an example or the application**
   ```bash
   cargo run --features candle --release
   ```

4. **Explore other examples** in the `examples/` directory

---

## API Integration Example

```rust
use helios_engine::{Config, Agent};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load configuration
    let config = Config::from_file("config.toml")?;
    
    // Create agent with Candle backend
    let mut agent = Agent::new(config).await?;
    
    // Send message and get response
    let response = agent.chat("What is machine learning?").await?;
    println!("Response: {}", response);
    
    // Continue conversation
    let response = agent.chat("Can you explain it more simply?").await?;
    println!("Response: {}", response);
    
    Ok(())
}
```

---

## Verification Checklist

- ✅ Candle provider modified to load from cache
- ✅ Config example updated with Qwen2.5-0.5B-Instruct
- ✅ Model already cached at `~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/`
- ✅ Code compiles without errors
- ✅ Documentation complete
- ✅ Ready to use!

---

## Support & Documentation

For detailed information, see:
- `CANDLE_LOCAL_SETUP.md` - Complete setup guide
- `QUICK_START_QWEN2.5.md` - Quick reference
- `config.example.toml` - Configuration template
- `examples/` - Working code examples

Enjoy running Qwen2.5-0.5B-Instruct locally! 🚀