# ✅ Setup Complete: Qwen2.5-0.5B-Instruct with Candle
## What Was Done
### 1. ✅ Code Changes
- **Modified `src/candle_provider.rs`** to add automatic HuggingFace cache loading
- Added `find_model_in_cache()` function to search local cache
- Modified `download_model_and_tokenizer()` to check cache first before downloading
- Supports `HF_HOME` environment variable for custom cache locations
- Automatically detects cache structure: `~/.cache/huggingface/hub/models--{repo}/snapshots/`
### 2. ✅ Configuration Updated
- **Updated `config.example.toml`** with Qwen2.5-0.5B-Instruct configuration
- Added documentation about cache loading
- Set optimal defaults for the 0.5B model
- Included alternative model examples
### 3. ✅ Documentation Created
- **`CANDLE_LOCAL_SETUP.md`** - Comprehensive setup guide with:
- Prerequisites and installation steps
- Model download instructions
- Configuration reference
- Code examples
- Troubleshooting guide
- Performance tips
- **`QUICK_START_QWEN2.5.md`** - Quick reference with:
- 3-step TL;DR setup
- All essential commands
- Configuration table
- Alternative models
- Performance tips
---
## Your Model Location
Your Qwen2.5-0.5B-Instruct model is already cached at:
```
~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/
└── snapshots/
└── 7ae557604adf67be50417f59c2c2f167def9a775/
├── model.safetensors (Main model file)
├── tokenizer.json (Tokenizer)
├── config.json (Model config)
├── generation_config.json
├── tokenizer_config.json
└── ... (other config files)
```
---
## Quick Start Commands
### Step 1: Verify Model Cache
```bash
ls ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/*/model.safetensors
```
### Step 2: Create config.toml
```bash
cp config.example.toml config.toml
```
### Step 3: Build
```bash
cargo build --features candle --release
```
### Step 4: Run
```bash
cargo run --features candle --release
```
---
## How Cache Loading Works
The Candle provider now automatically:
1. **Checks local cache first**
- Looks in `~/.cache/huggingface/hub/`
- Converts repo name to cache format: `Qwen/Qwen2.5-0.5B-Instruct` → `models--Qwen--Qwen2.5-0.5B-Instruct`
- Searches in `snapshots/` directory
2. **Loads model and tokenizer**
- Finds `model.safetensors` (the actual model)
- Finds `tokenizer.json` (for text tokenization)
- Both from the same snapshot directory
3. **Falls back to download** (if not cached)
- Only if model not found locally
- Downloads from HuggingFace Hub
- Stores in the standard cache location
**Result**: ⚡ No configuration needed! Just works automatically.
---
## Configuration (config.toml)
```toml
[candle]
# Repository on HuggingFace Hub
huggingface_repo = "Qwen/Qwen2.5-0.5B-Instruct"
# Model file (always model.safetensors for transformer models)
model_file = "model.safetensors"
# Context window - how many tokens the model can process
# Qwen2.5-0.5B supports up to 32,768 tokens
context_size = 32768
# Randomness in generation (0.0 = deterministic, 1.0 = random)
temperature = 0.7
# Maximum tokens to generate per request
max_tokens = 2048
# Use GPU if available (CUDA, Metal, etc.)
use_gpu = true
```
---
## Key Features
### 🚀 Automatic Cache Loading
- No manual cache configuration needed
- Respects `HF_HOME` environment variable
- Supports standard HuggingFace cache layout
### 📦 Offline Support
- Model loads instantly from cache after first download
- No internet needed after initial download
### ⚡ Performance
- GPU support with `use_gpu = true`
- 0.5B model is very fast (ideal for local development)
- Lightweight (~400-600 MB)
### 🔄 Multiple Model Support
- Same setup works for Llama, Gemma, Mistral, etc.
- Just change `huggingface_repo` in config
---
## Running Examples
```bash
# Basic chat interaction
cargo run --example basic_chat --features candle --release
# Direct LLM usage
cargo run --example direct_llm_usage --features candle --release
# Agent with tools
cargo run --example agent_with_tools --features candle --release
# Complete demo
cargo run --example complete_demo --features candle --release
# Forest of agents (multi-agent system)
cargo run --example forest_of_agents --features candle --release
```
---
## Environment Variables
### HF_HOME (Optional)
By default, HuggingFace cache is in `~/.cache/huggingface/`
To use a custom location:
```bash
export HF_HOME=/path/to/custom/cache
cargo run --features candle --release
```
---
## Alternative Models
The same cache loading mechanism works with other models. Just update `config.toml`:
```toml
[candle]
# Qwen2 (7B, better quality but larger)
huggingface_repo = "Qwen/Qwen2-7B-Instruct"
# Or Llama2
huggingface_repo = "meta-llama/Llama-2-7b-chat-hf"
# Or Gemma
huggingface_repo = "google/gemma-7b-it"
# Or Mistral
huggingface_repo = "mistralai/Mistral-7B-Instruct-v0.1"
```
Then download and run:
```bash
huggingface-cli download {your-model}
cargo run --features candle --release
```
---
## Troubleshooting
### Model not found in cache?
```bash
# Download the model
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct
# Verify it's cached
ls ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/*/model.safetensors
```
### Out of memory?
Reduce in `config.toml`:
```toml
context_size = 16384 # Instead of 32768
max_tokens = 1024 # Instead of 2048
```
### Slow inference?
Make sure GPU is enabled in `config.toml`:
```toml
use_gpu = true
```
### Tokenizer not found?
Ensure tokenizer.json exists:
```bash
ls ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/snapshots/*/tokenizer.json
```
If missing, re-download:
```bash
rm -rf ~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct
```
---
## Files Modified/Created
### Modified
- ✅ `src/candle_provider.rs` - Added cache lookup function
- ✅ `config.example.toml` - Updated with Qwen2.5 config
### Created (Documentation)
- ✅ `CANDLE_LOCAL_SETUP.md` - Detailed setup guide
- ✅ `QUICK_START_QWEN2.5.md` - Quick reference
- ✅ `SETUP_COMPLETE.md` - This file
---
## Next Steps
1. **Copy config template** (optional, you can edit config.example.toml directly)
```bash
cp config.example.toml config.toml
```
2. **Build the project**
```bash
cargo build --features candle --release
```
3. **Run an example or the application**
```bash
cargo run --features candle --release
```
4. **Explore other examples** in the `examples/` directory
---
## API Integration Example
```rust
use helios_engine::{Config, Agent};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Load configuration
let config = Config::from_file("config.toml")?;
// Create agent with Candle backend
let mut agent = Agent::new(config).await?;
// Send message and get response
let response = agent.chat("What is machine learning?").await?;
println!("Response: {}", response);
// Continue conversation
let response = agent.chat("Can you explain it more simply?").await?;
println!("Response: {}", response);
Ok(())
}
```
---
## Verification Checklist
- ✅ Candle provider modified to load from cache
- ✅ Config example updated with Qwen2.5-0.5B-Instruct
- ✅ Model already cached at `~/.cache/huggingface/hub/models--Qwen--Qwen2.5-0.5B-Instruct/`
- ✅ Code compiles without errors
- ✅ Documentation complete
- ✅ Ready to use!
---
## Support & Documentation
For detailed information, see:
- `CANDLE_LOCAL_SETUP.md` - Complete setup guide
- `QUICK_START_QWEN2.5.md` - Quick reference
- `config.example.toml` - Configuration template
- `examples/` - Working code examples
Enjoy running Qwen2.5-0.5B-Instruct locally! 🚀