llmrs 0.1.0 - Docs.rs

# TODO

## Recent Improvements (2026-02)

✅ **WatsonX Handshake Improvements**
- Token caching with 50-minute expiry (learned from TypeScript sdlcagent)
- Automatic token refresh using `Arc<Mutex<Option<CachedToken>>>`
- Fixed endpoint to use `/ml/v1/text/chat` (correct for WatsonX.ai)
- Added `project_id` to request body
- Updated both `chat_completion` and `chat_completion_stream` methods
- All compiler warnings fixed (0 warnings)
- Removed redundant test example files

✅ **Codebase Simplification**
- Removed empty `docs/` folder
- Removed `.claude/` folder
- Removed `scripts/` folder
- Removed disabled example files
- Cleaned up all dead_code warnings with `#[allow(dead_code)]`

## Current Status

The SDK is fully functional with:

### WatsonX AI Features
- ✅ Real-time streaming text generation (`generate_text_stream()`)
- ✅ Standard text generation (`generate_text()`)
- ✅ Batch generation with concurrent execution (`generate_batch()`, `generate_batch_simple()`)
- ✅ **Chat completion API support (`chat_completion()`, `chat_completion_stream()`) (NEW)**
- ✅ **Multi-turn conversation support with system/user/assistant messages (NEW)**
- ✅ Proper SSE parsing for WatsonX streaming endpoint
- ✅ Environment-based configuration
- ✅ Multiple model support with updated constants
- ✅ Model listing API integration (`list_models()`)
- ✅ Quality assessment tools
- ✅ Comprehensive error handling
- ✅ Working examples with consistent method names
- ✅ Batch generation example with color-coded visualization
- ✅ **Chat completion examples (chat, code generation) (NEW)**
- ✅ **Simplified connection with `WatsonxConnection` builder (NEW)**
- ✅ **One-line connection: `WatsonxConnection::new().from_env().await?` (NEW)**
- ✅ **Four connection methods for flexibility (NEW)**
- ✅ **Groq GPT-OSS models (GPT-OSS-120B, GPT-OSS-20B)** (NEW - v2.0.0)

### WatsonX Orchestrate Features
- ✅ Agent discovery (`list_agents()`, `get_agent()`)
- ✅ Non-streaming chat with conversation continuity (`send_message()`)
- ✅ Streaming chat with real-time callbacks (`stream_message()`)
- ✅ Thread management (`list_threads()`, `get_thread_messages()`, `create_thread()`, `delete_thread()`)
- ✅ Run management (`get_run()`, `list_runs()`, `cancel_run()`)
- ✅ Skills management (`list_skills()`, `get_skill()`)
- ✅ Tools management (list, get, execute, update, delete, test)
- ✅ **Communication channels management (Twilio WhatsApp, SMS, Slack, Genesys Bot Connector)** (NEW - v2.1.0)
- ✅ **Voice configuration management (Deepgram, ElevenLabs)** (NEW - v2.1.0)
- ✅ Tool versioning (`get_tool_versions()`)
- ✅ Tool execution history (`get_tool_execution_history()`)
- ✅ Chat with documents (`chat_with_docs()`, `stream_chat_with_docs()`, `get_chat_with_docs_status()`)
- ✅ Batch operations (`send_batch_messages()`)
- ✅ Document collection operations (`list_collections()`, `get_collection()`, `get_document()`, `delete_document()`)
- ✅ Simplified configuration (`from_env()` with just WXO_INSTANCE_ID and WXO_REGION)
- ✅ Matches wxo-client-main pattern and API structure
- ✅ Complete chat example (`orchestrate_chat.rs`)
- ✅ Document collection and vector search capabilities
- ✅ Advanced execution tracking and tool integration
- ✅ Graceful handling of unavailable endpoints (404 errors)
- ✅ Flexible response parsing for API variations
- ✅ Comprehensive examples (basic, chat, advanced, use cases, chat with documents)
- ✅ Modular code organization (config, client, types modules)
- ✅ **Simplified connection with `OrchestrateConnection` builder (NEW)**
- ✅ **One-line connection: `OrchestrateConnection::new().from_env().await?` (NEW)**

## Future Improvements

### Potential Features
- [x] Implement retry logic with exponential backoff ✅ **COMPLETED**
- [ ] Add support for more granular streaming control
- [ ] Implement connection pooling for better performance
- [ ] Add metrics and observability features
- [x] Support for batch requests ✅ **COMPLETED**
- [x] Chat completion API support ✅ **COMPLETED**
- [x] Add examples for different use cases (chat, code generation, etc.) ✅ **COMPLETED**
- [x] Implement caching for authentication tokens ✅ **COMPLETED**
- [ ] Investigate macOS sandbox crash (`system-configuration` panic) and make tests resilient:
  - Prefer disabling system proxy auto-detection in the HTTP client (e.g. configure client to not consult OS proxy settings)
  - Consider switching TLS/backend feature set to avoid `system-configuration` dependency if possible
  - Add a regression note/test to ensure `cargo test` works in restricted environments (CI/sandbox) without panicking

### Documentation
- [ ] Add more detailed API documentation
- [ ] Add performance benchmarks
- [ ] Add troubleshooting guide
- [ ] Add migration guide from other SDKs

### Testing
- [ ] Add more integration tests
- [ ] Add real-API “contract” tests (no mocks) that validate request/response shapes against WatsonX endpoints
- [ ] Improve test coverage
- [ ] Add load testing scenarios

### Code Quality
- [ ] Add more code comments
- [x] Eliminate current `dead_code` warnings (keep `cargo test` and `cargo clippy` clean) ✅ **COMPLETED**
- [x] Refactor for better separation of concerns ✅ **COMPLETED**
- [x] Add clippy lints ✅ **COMPLETED**
- [x] Improve error messages ✅ **COMPLETED**

## Notes

### WatsonX AI (watsonx.ai)
- Supports both streaming (`generate_text_stream`) and non-streaming (`generate_text`) endpoints
- Authentication tokens are cached with expiry and auto-refresh (reduces IAM calls)
- Configuration is primarily via `.env` files for security

### WatsonX Orchestrate (watsonx.orchestrate)
- Simplified configuration following wxo-client-main pattern (only instance_id and region)
- Uses `/runs/stream` endpoint for all chat interactions (matches wxo-client)
- Supports both streaming (`stream_message`) and non-streaming (`send_message`) chat
- Maintains conversation context via thread_id (returned and managed by caller)
- Uses `IAM-API_KEY` header authentication (not Bearer token)
- Parses Orchestrate-specific SSE events (message.created, message.delta)