# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
**llmrs** is an unofficial Rust SDK **focused on calling IBM WatsonX APIs**:
- **watsonx.ai** - Text generation (streaming/non-streaming), list models, batch, chat completion
- **watsonx.orchestrate** - Agents, threads, send/stream messages
**watsonx.data** and **watsonx.governance** are optional (feature-gated; off by default).
## Common Development Commands
### Build and Test
```bash
cargo build # Standard build
cargo build --release # Release build (optimized for size)
cargo check --all-targets # Quick type check
cargo test # Run all tests
cargo test -v # Run with verbose output
cargo test test_name # Run specific test
cargo test --no-fail-fast # Run all tests without stopping on first failure
```
### Code Quality
```bash
cargo fmt # Format code
cargo clippy # Run linter
cargo clippy -- -W clippy::pedantic # Run with pedantic rules
```
### Examples
```bash
cargo run --example basic_simple # WatsonX AI one-line connect + generate
cargo run --example streaming_generation # Streaming text generation
cargo run --example list_models # List available models
cargo run --example orchestrate_chat # Orchestrate agents and chat
cargo run --example batch_generation # Batch generation
```
## Architecture
### Module Structure
The SDK follows a **modular architecture** with clear separation between different WatsonX services:
```
src/
├── lib.rs # Re-exports public API
├── client.rs # WatsonxClient for AI operations
├── connection.rs # Simplified one-line connection builders
├── config.rs # Configuration management
├── auth.rs # IAM authentication
├── error.rs # Comprehensive error types (thiserror)
├── models.rs # Model ID constants (e.g., GRANITE_4_H_SMALL)
├── types.rs # Common data types
├── sse.rs # Server-Sent Events (streaming) parser
├── orchestrate/ # WatsonX Orchestrate module
│ ├── client.rs # OrchestrateClient
│ ├── config.rs # OrchestrateConfig
│ ├── connection.rs # OrchestrateConnection builder
│ └── types.rs # Orchestrate-specific types
├── data/ # WatsonX Data module (disabled)
└── governance/ # WatsonX Governance module (disabled)
```
### Connection Pattern
The SDK uses **convenient connection builders** for simplified setup:
- `WatsonxConnection::new().from_env().await?` - One-line connection for WatsonX AI
- `OrchestrateConnection::new().from_env().await?` - One-line connection for Orchestrate
With features `data` or `governance`: DataConnection / GovernanceConnection. Build is API-focused by default.
### Configuration System
Configuration is primarily **environment-based**:
- Credentials and env var names: **`src/env.rs`** and **`.env.example`** (do not mention credentials in docs; point to these).
- `WATSONX_PROJECT_ID` - WatsonX project ID (for AI)
- `WXO_INSTANCE_ID` - Orchestrate instance ID
See `config.rs` and `orchestrate/config.rs` for full environment variable lists.
### Error Handling
The SDK uses `thiserror` for comprehensive error handling with actionable guidance:
- `Error::Authentication` - Authentication failures
- `Error::Api` - API errors with retryable/non-retryable classification
- `Error::Timeout` - Request timeouts
- `Error::Validation` - Configuration validation errors
All errors include troubleshooting suggestions.
### Streaming Architecture
Streaming is handled via Server-Sent Events (SSE):
- `sse.rs` contains the SSE event parser
- `generate_text_stream()` accepts a callback: `|chunk: &str| { print!("{}", chunk); }`
- Proper chunking and buffer management for real-time output
## Key Patterns
### 1. Model Specification
**Important**: You must always specify a model before generating text:
```rust
let config = GenerationConfig::default().with_model(models::GRANITE_4_H_SMALL);
```
### 2. Batch Generation
Use `generate_batch_simple()` for uniform configuration across multiple prompts, or `generate_batch()` with `BatchRequest` for per-request customization.
### 3. Orchestrate Thread Management
Conversations use **thread-based context**:
- `send_message()` returns `(response, thread_id)` for continuation
- Pass `thread_id` to subsequent messages to maintain context
- Use `create_thread()` to explicitly create threads
### 4. Graceful Degradation
The Orchestrate API has multiple endpoints with varying availability. The client handles unavailable endpoints gracefully with proper error messages.
### 5. Async/Await Throughout
All network operations are async using Tokio runtime. All examples use `#[tokio::main]`.
## Dependencies
Key dependencies:
- `reqwest` - HTTP client with streaming support
- `tokio` - Async runtime
- `serde/serde_json` - Serialization
- `thiserror` - Error handling
- `futures` - Async streams
- `uuid` - UUID generation for Orchestrate
## Disabled Modules
WatsonX Data and Governance modules are temporarily disabled but code is complete:
- Data: Pending API endpoint discovery
- Governance: Pending Cloud Pak for Data (CPD) authentication support
See `docs/disabled-modules/` for details on re-enabling.
## Build Profiles
- **release**: Optimized for size (LTO, strip symbols, minimal binary)
- **dev**: Fast compilation for development
- **minimal**: Ultra-minimal build for deployment
- **bench**: Optimized for benchmarking with debug info