helios-engine 0.2.2

A powerful and flexible Rust framework for building LLM-powered agents with tool support Both locally and online
Documentation
# Fixes Applied

## Issue: Empty Replies in Offline Mode

### Problem
When using local models in offline mode (both `ask` and `chat` commands), the agent was returning empty replies or not streaming output properly.

### Root Cause
The issue was with output suppression in the streaming implementation. The `suppress_output()` function was redirecting stdout/stderr file descriptors globally for the entire process, which prevented the streamed tokens from being printed even though they were being generated and sent through the channel.

### Solution Applied

1. **Removed global output suppression during generation**: The `suppress_output()` calls were moved out of the token generation loop. Output suppression is now only used during model loading in `LocalLLMProvider::new()`, not during the streaming generation.

2. **Fixed streaming architecture**: The streaming implementation now correctly:
   - Generates tokens in a blocking task
   - Sends tokens through an unbounded channel
   - Receives and processes tokens in the async context
   - Calls the `on_chunk` callback for each token without output suppression

3. **Updated main.rs**: Changed the `ask_once` function to use streaming for both local and remote models, ensuring consistent behavior.

### Files Modified
- `src/llm.rs`: Removed output suppression from `chat_stream_local` method
- `src/main.rs`: Unified streaming for both local and remote models in ask mode

### Testing
Both modes now work correctly:

```bash
# Ask mode with local model
cargo run -- --mode offline ask "what is 2+2?"
# Output: : 4

# Chat mode with local model  
cargo run -- --mode offline chat
# Now streams responses token-by-token
```

### Known Behavior
llama.cpp library may output debug information to stderr during context creation. This is normal behavior from the underlying library and doesn't affect functionality. To suppress these messages, redirect stderr:

```bash
cargo run -- --mode offline ask "hello" 2>/dev/null
```

### Status
✅ **FIXED** - Both ask and chat modes now properly stream responses with local models.