# Streaming Support in CloudLLM
This document explains how to use the streaming feature in CloudLLM to receive LLM responses in real-time as tokens arrive.
## Overview
Streaming support allows you to display LLM responses incrementally as they are generated, rather than waiting for the complete response. This dramatically reduces perceived latency and provides a better user experience.
## Benefits
- **Reduced Perceived Latency**: Users see tokens appear immediately as the LLM generates them
- **Better UX**: The "typing" effect feels more responsive and natural
- **Easy to Use**: Similar API to non-streaming methods
- **Backward Compatible**: Existing code continues to work unchanged
## Basic Usage
### Using LLMSession
```rust
use cloudllm::clients::openai::{Model, OpenAIClient};
use cloudllm::client_wrapper::Role;
use cloudllm::LLMSession;
use futures_util::StreamExt;
use std::sync::Arc;
#[tokio::main]
async fn main() {
let secret_key = std::env::var("OPEN_AI_SECRET").expect("OPEN_AI_SECRET not set");
let client = OpenAIClient::new_with_model_enum(&secret_key, Model::GPT5Nano);
let mut session = LLMSession::new(
Arc::new(client),
"You are a helpful assistant.".to_string(),
8192,
);
// Send a message with streaming enabled
match session
.send_message_stream(Role::User, "Write a haiku about Rust.".to_string(), None)
.await
{
Ok(Some(mut stream)) => {
// Stream is available - process chunks as they arrive
let mut full_response = String::new();
while let Some(chunk_result) = stream.next().await {
match chunk_result {
Ok(chunk) => {
// Display the incremental content
if !chunk.content.is_empty() {
print!("{}", chunk.content);
full_response.push_str(&chunk.content);
}
// Check if streaming is complete
if let Some(reason) = chunk.finish_reason {
println!("\n[Finished: {}]", reason);
}
}
Err(e) => {
eprintln!("Error in stream: {}", e);
break;
}
}
}
println!("\nReceived {} chars", full_response.len());
}
Ok(None) => {
// Streaming not supported by this client
println!("Streaming not available");
}
Err(e) => {
eprintln!("Error: {}", e);
}
}
}
```
## MessageChunk Structure
Each chunk in the stream is a `MessageChunk` with:
- `content: String` - The incremental text content (may be empty)
- `finish_reason: Option<String>` - Indicates why streaming ended (e.g., "stop", "length")
## Important Notes
### Token Usage Tracking
⚠️ Token usage tracking is **not available** for streaming responses. If you need accurate token counts, use the non-streaming `send_message()` method instead.
### Conversation History
When using `LLMSession::send_message_stream()`:
1. The user message is automatically added to conversation history
2. The assistant response is **not** automatically added
3. You can manually add the accumulated response to history if needed:
```rust
// After collecting the full streamed response
if !full_response.is_empty() {
session.send_message(Role::Assistant, full_response, None).await?;
}
```
### Supported Clients
- ✅ **OpenAIClient**: Full streaming support
- ✅ **GrokClient**: Full streaming support (delegates to OpenAI)
- ⏳ **Other clients**: Return `None` (not yet implemented)
You can check if a client supports streaming:
```rust
match client.send_message_stream(&messages, None).await? {
Some(stream) => { /* streaming available */ }
None => { /* fall back to non-streaming */ }
}
```
## Error Handling
Streaming can fail at two points:
1. **Initiation**: The request to start streaming fails
```rust
.await? ```
2. **During streaming**: Individual chunks may fail
```rust
match chunk_result {
Ok(chunk) => { }
Err(e) => { }
}
```
## Performance Considerations
- Streaming provides **better perceived performance** but may use slightly more bandwidth
- For very short responses, non-streaming might be faster
- For longer responses, streaming provides immediate feedback
## Complete Example
See `examples/streaming_example.rs` for a complete working example that demonstrates:
- Basic streaming usage
- Error handling
- Accumulating the full response
- Managing conversation history
## Migration from Non-Streaming
Existing code using `send_message()` continues to work without changes:
```rust
// Old code - still works!
let response = session.send_message(Role::User, "Hello".to_string(), None).await?;
```
To add streaming:
```rust
// New streaming code
if let Some(mut stream) = session.send_message_stream(Role::User, "Hello".to_string(), None).await? {
while let Some(chunk_result) = stream.next().await {
// Process chunks
}
}
```
## Future Improvements
Planned enhancements:
- Token usage tracking for streaming responses
- Automatic conversation history management for streamed responses
- Streaming support for additional providers (Claude, Gemini, etc.)