0.4.0 OCT/13/2025
- feat: Multi-Agent Council System
- New council.rs module implementing collaborative multi-agent orchestration
- Five council modes for different collaboration patterns:
- Parallel: All agents process prompt simultaneously, responses aggregated
- RoundRobin: Agents take sequential turns building on previous responses
- Moderated: Agents submit proposals, moderator synthesizes final answer
- Hierarchical: Lead agent coordinates, specialists handle specific aspects
- Debate: Agents discuss and challenge each other until convergence
- Agent identity system with name, expertise, personality, and optional tool access
- Conversation history tracking with CouncilMessage and round metadata
- CouncilResponse includes final answer, message history, rounds executed, convergence score, and total tokens used
- feat: Tool Protocol Abstraction Layer (tool_protocol.rs)
- Flexible abstraction for connecting agents to various tool protocols
- ToolProtocol trait with execute(), list_tools(), get_tool_metadata()
- Support for MCP (Model Context Protocol), custom functions, and user-defined protocols
- ToolResult struct with success status, output, error, and execution metadata
- ToolParameter system with support for String, Number, Integer, Boolean, Array, Object types
- ToolMetadata with parameter definitions and protocol-specific metadata
- ToolRegistry for centralized tool management
- Tool and ToolError types for type-safe tool operations
- feat: Tool Adapters (tool_adapters.rs)
- CustomToolAdapter: Execute user-defined Rust closures as tools
- MCPToolAdapter: Integration with Model Context Protocol servers
- OpenAIToolAdapter: Compatible with OpenAI function calling format
- All adapters implement async ToolProtocol trait
- feat: Automatic Tool Execution in Agent Generation
- Agents automatically discover and execute tools during response generation
- Tool information injected into system prompts with name, description, and parameters
- JSON-based tool calling format: {"tool_call": {"name": "...", "parameters": {...}}}
- Automatic tool execution loop with max 5 iterations to prevent infinite loops
- Tool results fed back to LLM as user messages for continued generation
- Token usage tracked cumulatively across all LLM calls and tool executions
- New AgentResponse struct returns both content and token usage
- Agent::generate_with_tokens() method for internal use with token tracking
- feat: Convergence Detection for Debate Mode
- Jaccard similarity-based convergence detection for debate termination
- Compares word sets between consecutive debate rounds
- Configurable convergence threshold (default: 0.75 / 75% similarity)
- Early termination when agents reach consensus, saving tokens and cost
- Convergence score returned in CouncilResponse for inspection
- calculate_convergence_score() and jaccard_similarity() helper methods
- feat: Token Usage Tracking in Council Modes
- Parallel mode tracks and aggregates tokens from all concurrent agents
- RoundRobin mode accumulates tokens across sequential turns
- Token usage includes all LLM calls plus tool execution overhead
- CouncilResponse.total_tokens_used provides complete cost visibility
- feat: Comprehensive Multi-Agent Tutorial (COUNCIL_TUTORIAL.md)
- Cookbook-style tutorial with progressive complexity
- Five detailed recipes demonstrating each council mode
- Real-world carbon capture strategy problem domain
- Examples use multiple LLM providers (OpenAI, Claude, Gemini, Grok)
- Up to 5 agents per example with distinct expertise and personalities
- Tool integration example with MCPToolAdapter
- Best practices guide for agent design and council mode selection
- Troubleshooting section with common issues and solutions
- Complete multi-stage pipeline example combining multiple modes
- test: Added comprehensive test coverage
- test_agent_with_tool_execution: Validates tool discovery, execution, and result integration
- test_debate_mode_convergence: Validates convergence detection with mock agents
- test_parallel_execution: Tests concurrent agent execution
- test_round_robin_execution: Tests sequential turn-taking
- test_moderated_execution: Tests proposal aggregation
- test_hierarchical_execution: Tests lead-specialist coordination
- test_debate_execution: Tests debate discussion flow
- All 12 tests passing
- refactor: Code quality improvements
- Fixed all compiler warnings
- Switched from std::sync::Mutex to tokio::sync::Mutex for async compatibility
- Removed unused imports and assignments
- Improved error handling in council execution paths
- docs: Added inline documentation for council system and tool protocol
- Module-level documentation with architecture diagrams
- Example code in doc comments
- Detailed parameter and return value documentation
0.3.0 OCT/11/2025
- feat: Add first-class streaming support for LLM responses
- New MessageChunk type for incremental content delivery
- ClientWrapper::send_message_stream() method for streaming responses
- Implemented in OpenAIClient and GrokClient (delegates to OpenAI)
- LLMSession::send_message_stream() for session-aware streaming
- Dramatically reduces perceived latency - users see tokens as they arrive
- Streaming returns Stream<Item = Result<MessageChunk, Box<dyn Error>>>
- Backward compatible - existing non-streaming code unchanged
- Added streaming_example.rs demonstrating usage
- Added futures-util dependency for Stream trait
- Note: Token usage tracking not available for streaming responses
- Optimized LLMSession conversation history with VecDeque to eliminate O(N²) insert/remove churn
- Minimize string formatting overhead in hot path error logs
- Replace std::sync::Mutex with tokio::sync::Mutex for async-friendly token usage tracking
- Change ClientWrapper::send_message to accept &[Message] instead of Vec<Message>
- refactor: Move all client tests to external integration test files
- feat: make token usage retrieval async and clean up Claude/OpenAI client usage
- Implement token count caching for efficient message trimming
- Implement pre-transmission trimming to reduce network overhead and latency
- Arena/bump allocation for message bodies
- Reuse request buffers in LLMSession to avoid repeated allocations on each send
- OpenAI and Gemini clients preallocate formatted_messages with Vec::with_capacity
- feat: Add model_name() to ClientWrapper and LLMSession
- feat: Implement persistent HTTP connection pooling for all provider clients
0.2.12 SEP/21/2025
- Added Claude client implementation at src/cloudllm/clients/claude.rs:
- ClaudeClient struct follows the same delegate pattern as GrokClient, using OpenAIClient internally
- Supports Anthropic API base URL (https://api.anthropic.com/v1)
- Includes 6 Claude model variants: Claude35Sonnet20241022, Claude35Haiku20241022, Claude3Opus20240229, Claude35Sonnet20240620, Claude3Sonnet20240229, Claude3Haiku20240307
- Implements all standard constructor methods and ClientWrapper trait
- Added test function with CLAUDE_API_KEY environment variable
- Updated README.md to mark Claude as supported (removed "Coming Soon")
- Added Claude example to interactive_session.rs example
0.2.10 SEP/21/2025
- New Grok model enums added to the GrokClient:
- grok-4-fast-reasoning
- grok-4-fast-non-reasoning
- grok-code-fast-1
0.2.9
- New OpenAI model enums added to the OpenAIClient:
- gpt-5
- gpt-5-mini
- gpt-5-nano
- gpt-5-chat-latest
- Upgraded tokio to 1.47.1
0.2.8
- Bumped cloudllm version to 0.2.8
- Upgraded tokio dependency from 1.44.5 to 1.46.1
- Updated Grok client model names and enums in src/cloudllm/clients/grok.rs:
- Renamed Grok3MiniFastBeta to Grok3MiniFast, Grok3MiniBeta to Grok3Mini, Grok3FastBeta to Grok3Fast, Grok3Beta to Grok3, and Grok3Latest to Grok4_0709
- Updated model_to_string function to reflect new model names
- Changed test client initialization to use Grok4_0709 instead of Grok3Latest
- Updated Gemini client model names and enums in src/cloudllm/clients/gemini.rs:
- Renamed Gemini25FlashPreview0520 to Gemini25Flash and Gemini25ProPreview0506 to Gemini25Pro to reflect stable releases
- Added new model enum Gemini25FlashLitePreview0617 for lightweight preview model
- Updated model_to_string function to map new enum names: gemini-2.5-flash, gemini-2.5-pro, and gemini-2.5-flash-lite-preview-06-17
0.2.7
- Bumped cloudllm version to 0.2.7
- Upgraded openai-rust2 dependency from 1.5.9 to 1.6.0
- Extended ChatArguments and client wrappers for search and tool support:
- Added `SearchParameters` struct and `with_search_parameters()` builder to `openai_rust::chat::ChatArguments`
- Added `ToolType` enum and `Tool` struct, plus `tools` field and `with_tools()` builder (snake_case serialization)
- Updated `ClientWrapper::send_message` signature to accept `optional_search_parameters: Option<SearchParameters>`
- Modified `clients/common.rs` `send_and_track()` to take and inject `optional_search_parameters`
- Updated `OpenAIClient`, `GeminiClient`, and `GrokClient` to forward `optional_search_parameters` to `send_and_track`
- Exposed `optional_search_parameters` through `LLMSession::send_message` and its callers
- Other updates:
- Added `Grok3Latest` variant to `grok::Model` enum and updated test to use it
- Ensured backward compatibility: all existing call sites default `optional_search_parameters` to `None`
0.2.6
- Implemented token usage tracking across the LLMSession and ClientWrapper trait, including:
- New TokenUsage struct for standardized tracking of input, output, and total tokens.
- LLMSession now accumulates actual token usage after each message.
- LLMSession::token_usage() method added for external inspection.
- ClientWrapper trait extended with get_last_usage() (default: None) and new usage_slot() hook.
- Refactored token usage handling in OpenAIClient and GeminiClient:
- Introduced a common send_and_track helper in clients/common.rs to centralize usage capture logic.
- OpenAIClient and GeminiClient now store usage in an internal Mutex<Option<TokenUsage>>.
- Redundant get_last_usage() implementations removed; only usage_slot() is overridden.
- Added multiple constructors to GeminiClient: support for model strings, enums, and base URL configuration.
- Improved LLMSession context management:
- Added max_tokens field with get_max_tokens() accessor.
- Prunes conversation history using estimated token count per message.
- Precise control over total_context_tokens and total_token_count.
- Example interactive_session.rs refactored to:
- Demonstrate integration with both OpenAIClient, GrokClient, and GeminiClient.
- Show token usage in real-time after each LLM response.
- Test max_tokens pruning logic with visible metrics.
- Added model variants to GeminiClient enum:
- Gemini25FlashPreview0520
- Gemini25ProPreview0506
- Cleaned up and reorganized internal code:
- Moved constructors and imports for clarity.
- Removed redundant comments and unused stub code.
- Updated README example (interactive_session.md) with new usage patterns.
0.2.5
- Bumped tokio from 1.44.2 to 1.44.5
- Updated openai-rust2 from 1.5.8 to 1.5.9, with updated support for image generation models
0.2.4
- New enums for OpenAI client: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano
- example in interactive_session.rs now uses gpt-4.1-mini
0.2.3
- Modified LLMSession to use Arc<dyn ClientWrapper> instead of generic T: ClientWrapper, enabling dynamic selection of client implementations (e.g., OpenAIClient, GrokClient, GeminiClient) at runtime.
- Updated LLMSession::new to accept Arc<dyn ClientWrapper>, removing Arc::new wrapping inside the constructor.
- Adjusted tests in gemini.rs and grok.rs to use Arc::new and non-generic LLMSession.
- Updated interactive_session.rs example to wrap client in Arc::new.
- Added init_logger function in lib.rs for thread-safe logger initialization using env_logger and Once.
- Replaced env_logger::try_init with crate::init_logger in gemini.rs and grok.rs tests for consistency.
- Updated GrokClient test to use Grok3MiniFastBeta enum variant.
- Updated LLMSession documentation to reflect Arc::new usage.
- Updated ClientWrapper trait to require Send + Sync, ensuring Arc<dyn ClientWrapper> is thread-safe and LLMSession can be used in async tasks (e.g., Tokio spawn).
- Enables safe dynamic client selection in multithreaded contexts
- All tests pass with the new implementation
0.2.2
- New Grok3 enums for the grokclient available
- Dependencies updated, cargo formatted
0.2.1
- Added new enums for O4Mini, O4MiniHigh and O3
- New enum for gpt-4.5-preview for the OpenAIClient
0.1.9 - feb.26.2025
- Update dependencies: tokio to 1.43.0, async-trait to 0.1.86, log to 0.4.26
- Refactor send_message in ClientWrapper to remove opt_url_path parameter
- Update GeminiClient to use openai_rust::Client and handle Gemini API directly
- Adjust OpenAIClient, GrokClient, LLMSession, and examples to new send_message signature
0.1.8 - feb.26.2025
- documentation updates
0.1.7 - feb.26.2025
- Update README.md to reflect support for Gemini
- Introduce Model enum in openai.rs for OpenAI models with model_to_string function
- Modify ClientWrapper trait to include optional URL path for API requests
- Update GeminiClient to use new ClientWrapper signature and set base URL to 'https://generativelanguage.googleapis.com/v1beta/'
- Adjust GrokClient to align with updated ClientWrapper signature
- Enhance OpenAIClient with Model enum support and optional URL paths in send_message
- Update LLMSession to pass optional URL path to client in send_message
- Revise examples/interactive_session.rs to use Model enum and new client methods
- Increment openai-rust2 dependency to version 1.5.8
- Fix minor formatting and improve error logging in clients
0.1.6 - feb.23.2025
- Introduced GrokClient in src/cloudllm/clients/grok.rs with support for multiple Grok models.
- Implemented the ClientWrapper trait for GrokClient to enable message sending via OpenAIClient.
- Added a test (test_grok_client) demonstrating basic usage and integration.
- Updated src/cloudllm/clients/mod.rs to include the new grok module.
0.1.5 - jan.23.2025
- Removed the original `openai-rust` dependency.
- Added `openai-rust2` as a new dependency, pointing to your custom fork with improvements.
- Added a new constructor `new_with_base_url` to allow specifying a custom base URL
- Ensured all modules are properly referenced and re-exported for future scalability.