llmrs 0.1.0

Unofficial Rust SDK for IBM WatsonX AI platform
Documentation
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**llmrs** is an unofficial Rust SDK **focused on calling IBM WatsonX APIs**:
- **watsonx.ai** - Text generation (streaming/non-streaming), list models, batch, chat completion
- **watsonx.orchestrate** - Agents, threads, send/stream messages

**watsonx.data** and **watsonx.governance** are optional (feature-gated; off by default).

## Common Development Commands

### Build and Test
```bash
cargo build                      # Standard build
cargo build --release           # Release build (optimized for size)
cargo check --all-targets       # Quick type check
cargo test                      # Run all tests
cargo test -v                   # Run with verbose output
cargo test test_name            # Run specific test
cargo test --no-fail-fast       # Run all tests without stopping on first failure
```

### Code Quality
```bash
cargo fmt                       # Format code
cargo clippy                    # Run linter
cargo clippy -- -W clippy::pedantic  # Run with pedantic rules
```

### Examples
```bash
cargo run --example basic_simple               # WatsonX AI one-line connect + generate
cargo run --example streaming_generation      # Streaming text generation
cargo run --example list_models               # List available models
cargo run --example orchestrate_chat          # Orchestrate agents and chat
cargo run --example batch_generation          # Batch generation
```

## Architecture

### Module Structure

The SDK follows a **modular architecture** with clear separation between different WatsonX services:

```
src/
├── lib.rs              # Re-exports public API
├── client.rs           # WatsonxClient for AI operations
├── connection.rs       # Simplified one-line connection builders
├── config.rs           # Configuration management
├── auth.rs             # IAM authentication
├── error.rs            # Comprehensive error types (thiserror)
├── models.rs           # Model ID constants (e.g., GRANITE_4_H_SMALL)
├── types.rs            # Common data types
├── sse.rs              # Server-Sent Events (streaming) parser
├── orchestrate/        # WatsonX Orchestrate module
│   ├── client.rs       # OrchestrateClient
│   ├── config.rs       # OrchestrateConfig
│   ├── connection.rs   # OrchestrateConnection builder
│   └── types.rs        # Orchestrate-specific types
├── data/               # WatsonX Data module (disabled)
└── governance/         # WatsonX Governance module (disabled)
```

### Connection Pattern

The SDK uses **convenient connection builders** for simplified setup:
- `WatsonxConnection::new().from_env().await?` - One-line connection for WatsonX AI
- `OrchestrateConnection::new().from_env().await?` - One-line connection for Orchestrate
With features `data` or `governance`: DataConnection / GovernanceConnection. Build is API-focused by default.

### Configuration System

Configuration is primarily **environment-based**:
- Credentials and env var names: **`src/env.rs`** and **`.env.example`** (do not mention credentials in docs; point to these).
- `WATSONX_PROJECT_ID` - WatsonX project ID (for AI)
- `WXO_INSTANCE_ID` - Orchestrate instance ID

See `config.rs` and `orchestrate/config.rs` for full environment variable lists.

### Error Handling

The SDK uses `thiserror` for comprehensive error handling with actionable guidance:
- `Error::Authentication` - Authentication failures
- `Error::Api` - API errors with retryable/non-retryable classification
- `Error::Timeout` - Request timeouts
- `Error::Validation` - Configuration validation errors

All errors include troubleshooting suggestions.

### Streaming Architecture

Streaming is handled via Server-Sent Events (SSE):
- `sse.rs` contains the SSE event parser
- `generate_text_stream()` accepts a callback: `|chunk: &str| { print!("{}", chunk); }`
- Proper chunking and buffer management for real-time output

## Key Patterns

### 1. Model Specification
**Important**: You must always specify a model before generating text:
```rust
let config = GenerationConfig::default().with_model(models::GRANITE_4_H_SMALL);
```

### 2. Batch Generation
Use `generate_batch_simple()` for uniform configuration across multiple prompts, or `generate_batch()` with `BatchRequest` for per-request customization.

### 3. Orchestrate Thread Management
Conversations use **thread-based context**:
- `send_message()` returns `(response, thread_id)` for continuation
- Pass `thread_id` to subsequent messages to maintain context
- Use `create_thread()` to explicitly create threads

### 4. Graceful Degradation
The Orchestrate API has multiple endpoints with varying availability. The client handles unavailable endpoints gracefully with proper error messages.

### 5. Async/Await Throughout
All network operations are async using Tokio runtime. All examples use `#[tokio::main]`.

## Dependencies

Key dependencies:
- `reqwest` - HTTP client with streaming support
- `tokio` - Async runtime
- `serde/serde_json` - Serialization
- `thiserror` - Error handling
- `futures` - Async streams
- `uuid` - UUID generation for Orchestrate

## Disabled Modules

WatsonX Data and Governance modules are temporarily disabled but code is complete:
- Data: Pending API endpoint discovery
- Governance: Pending Cloud Pak for Data (CPD) authentication support

See `docs/disabled-modules/` for details on re-enabling.

## Build Profiles

- **release**: Optimized for size (LTO, strip symbols, minimal binary)
- **dev**: Fast compilation for development
- **minimal**: Ultra-minimal build for deployment
- **bench**: Optimized for benchmarking with debug info