# Streaming Support in CloudLLM
This document explains how to use the streaming feature in CloudLLM to receive LLM responses in real-time as tokens arrive.
## Overview
Streaming support allows you to display LLM responses incrementally as they are generated, rather than waiting for the complete response. This dramatically reduces perceived latency and provides a better user experience.
## Benefits
- **Reduced Perceived Latency**: Users see tokens appear immediately as the LLM generates them
- **Better UX**: The "typing" effect feels more responsive and natural
- **Easy to Use**: Similar API to non-streaming methods
- **Backward Compatible**: Existing code continues to work unchanged
## Basic Usage
### Using LLMSession
```rust
use cloudllm::clients::openai::{Model, OpenAIClient};
use cloudllm::client_wrapper::Role;
use cloudllm::LLMSession;
use futures_util::StreamExt;
use std::sync::Arc;
#[tokio::main]
async fn main() {
    let secret_key = std::env::var("OPEN_AI_SECRET").expect("OPEN_AI_SECRET not set");
    let client = OpenAIClient::new_with_model_enum(&secret_key, Model::GPT41Nano);
    
    let mut session = LLMSession::new(
        Arc::new(client),
        "You are a helpful assistant.".to_string(),
        8192,
    );
    // Send a message with streaming enabled
    match session
        .send_message_stream(Role::User, "Write a haiku about Rust.".to_string(), None)
        .await
    {
        Ok(Some(mut stream)) => {
            // Stream is available - process chunks as they arrive
            let mut full_response = String::new();
            
            while let Some(chunk_result) = stream.next().await {
                match chunk_result {
                    Ok(chunk) => {
                        // Display the incremental content
                        if !chunk.content.is_empty() {
                            print!("{}", chunk.content);
                            full_response.push_str(&chunk.content);
                        }
                        
                        // Check if streaming is complete
                        if let Some(reason) = chunk.finish_reason {
                            println!("\n[Finished: {}]", reason);
                        }
                    }
                    Err(e) => {
                        eprintln!("Error in stream: {}", e);
                        break;
                    }
                }
            }
            
            println!("\nReceived {} chars", full_response.len());
        }
        Ok(None) => {
            // Streaming not supported by this client
            println!("Streaming not available");
        }
        Err(e) => {
            eprintln!("Error: {}", e);
        }
    }
}
```
## MessageChunk Structure
Each chunk in the stream is a `MessageChunk` with:
- `content: String` - The incremental text content (may be empty)
- `finish_reason: Option<String>` - Indicates why streaming ended (e.g., "stop", "length")
## Important Notes
### Token Usage Tracking
⚠️ Token usage tracking is **not available** for streaming responses. If you need accurate token counts, use the non-streaming `send_message()` method instead.
### Conversation History
When using `LLMSession::send_message_stream()`:
1. The user message is automatically added to conversation history
2. The assistant response is **not** automatically added
3. You can manually add the accumulated response to history if needed:
```rust
// After collecting the full streamed response
if !full_response.is_empty() {
    session.send_message(Role::Assistant, full_response, None).await?;
}
```
### Supported Clients
- ✅ **OpenAIClient**: Full streaming support
- ✅ **GrokClient**: Full streaming support (delegates to OpenAI)
- ⏳ **Other clients**: Return `None` (not yet implemented)
You can check if a client supports streaming:
```rust
match client.send_message_stream(&messages, None).await? {
    Some(stream) => { /* streaming available */ }
    None => { /* fall back to non-streaming */ }
}
```
## Error Handling
Streaming can fail at two points:
1. **Initiation**: The request to start streaming fails
   ```rust
   .await?    ```
2. **During streaming**: Individual chunks may fail
   ```rust
   match chunk_result {
       Ok(chunk) => {  }
       Err(e) => {  }
   }
   ```
## Performance Considerations
- Streaming provides **better perceived performance** but may use slightly more bandwidth
- For very short responses, non-streaming might be faster
- For longer responses, streaming provides immediate feedback
## Complete Example
See `examples/streaming_example.rs` for a complete working example that demonstrates:
- Basic streaming usage
- Error handling
- Accumulating the full response
- Managing conversation history
## Migration from Non-Streaming
Existing code using `send_message()` continues to work without changes:
```rust
// Old code - still works!
let response = session.send_message(Role::User, "Hello".to_string(), None).await?;
```
To add streaming:
```rust
// New streaming code
if let Some(mut stream) = session.send_message_stream(Role::User, "Hello".to_string(), None).await? {
    while let Some(chunk_result) = stream.next().await {
        // Process chunks
    }
}
```
## Future Improvements
Planned enhancements:
- Token usage tracking for streaming responses
- Automatic conversation history management for streamed responses
- Streaming support for additional providers (Claude, Gemini, etc.)