ai-lib-rust 0.8.5

Protocol Runtime for AI-Protocol - A high-performance Rust reference implementation
Documentation

ai-lib-rust

Protocol Runtime for AI-Protocol - A high-performance Rust reference implementation.

ai-lib-rust is the Rust runtime implementation for the AI-Protocol specification. It embodies the core design principle: 一切逻辑皆算子,一切配置皆协议 (All logic is operators, all configuration is protocol).

🎯 Design Philosophy

Unlike traditional adapter libraries that hardcode provider-specific logic, ai-lib-rust is a protocol-driven runtime that executes AI-Protocol specifications. This means:

  • Zero hardcoded provider logic: All behavior is driven by protocol manifests (source YAML or dist JSON)
  • Operator-based architecture: Processing is done through composable operators (Decoder → Selector → Accumulator → FanOut → EventMapper)
  • Hot-reloadable: Protocol configurations can be updated without restarting the application
  • Unified interface: Developers interact with a single, consistent API regardless of the underlying provider

🏗️ Architecture

The library is organized into three layers:

1. Protocol Specification Layer (protocol/)

  • Loader: Loads protocol files from local filesystem, embedded assets, or remote URLs
  • Validator: Validates protocols against JSON Schema
  • Schema: Protocol structure definitions

2. Pipeline Interpreter Layer (pipeline/)

  • Decoder: Parses raw bytes into protocol frames (SSE, JSON Lines, etc.)
  • Selector: Filters frames using JSONPath expressions
  • Accumulator: Accumulates stateful data (e.g., tool call arguments)
  • FanOut: Handles multi-candidate scenarios
  • EventMapper: Converts protocol frames to unified events

3. User Interface Layer (client/, types/)

  • Client: Unified client interface
  • Types: Standard type system based on AI-Protocol standard_schema

🔄 V2 Protocol Alignment

Starting with v0.7.0, ai-lib-rust aligns with the AI-Protocol V2 specification. V0.8.0 adds full V2 runtime support including V2 manifest parsing, provider drivers, MCP, Computer Use, and extended multimodal.

Standard Error Codes (V2)

All provider errors are classified into 13 standard error codes with unified retry/fallback semantics:

Code Name Retryable Fallbackable
E1001 invalid_request No No
E1002 authentication No Yes
E1003 permission_denied No No
E1004 not_found No No
E1005 request_too_large No No
E2001 rate_limited Yes Yes
E2002 quota_exhausted No Yes
E3001 server_error Yes Yes
E3002 overloaded Yes Yes
E3003 timeout Yes Yes
E4001 conflict Yes No
E4002 cancelled No No
E9999 unknown No No

Classification follows a priority pipeline: provider-specific error code → HTTP status override → standard HTTP mapping → E9999.

Compliance Tests

Cross-runtime behavioral consistency is verified by a shared YAML-based test suite from the ai-protocol repository:

# Run compliance tests

cargo test --test compliance


# With explicit compliance directory

COMPLIANCE_DIR=../ai-protocol/tests/compliance cargo test --test compliance

For details, see CROSS_RUNTIME.md.

Testing with ai-protocol-mock

For integration and MCP tests without real API calls, use ai-protocol-mock:

# Start mock server (from ai-protocol-mock repo)

docker-compose up -d


# Run tests with mock

MOCK_HTTP_URL=http://localhost:4010 MOCK_MCP_URL=http://localhost:4010/mcp cargo test -- --ignored --nocapture


# Run specific mock integration tests

MOCK_HTTP_URL=http://localhost:4010 cargo test test_sse_streaming_via_mock test_error_classification_via_mock -- --ignored --nocapture

Or in code: AiClientBuilder::new().base_url_override("http://localhost:4010").build(...)

🧩 Feature flags & re-exports

ai-lib-rust keeps the runtime core small, and exposes optional capabilities behind feature flags. This aligns with the V2 "lean core, progressive complexity" design principle.

For a deeper overview, see docs/ARCHITECTURE.md.

  • Always available re-exports (crate root):
    • AiClient, AiClientBuilder, CancelHandle, CallStats, ChatBatchRequest, ClientMetrics, EndpointExt
    • Message, MessageRole, StreamingEvent, ToolCall
    • Result<T>, Error, ErrorContext
    • FeedbackEvent, FeedbackSink (core feedback types)
  • Capability features (V2 aligned):
    • embeddings: embedding generation (EmbeddingClient)
    • batch: batch API processing (BatchExecutor)
    • guardrails: input/output validation
    • tokens: token counting and cost estimation
    • telemetry: advanced observability sinks (InMemoryFeedbackSink, ConsoleFeedbackSink, etc.)
    • mcp: MCP (Model Context Protocol) tool bridge — namespace-based tool conversion and filtering
    • computer_use: Computer Use abstraction — safety policies, domain allowlists, action validation
    • multimodal: Extended multimodal support — vision, audio, video modality validation and format checks
    • reasoning: Extended reasoning / chain-of-thought support
  • Infrastructure features:
    • routing_mvp: pure logic model management helpers (CustomModelManager, ModelArray, etc.)
    • interceptors: application-layer call hooks (InterceptorPipeline, Interceptor, RequestContext)
  • Meta-feature:
    • full: enables all capability and infrastructure features

Enable with:

[dependencies]

# Lean core (default)

ai-lib-rust = "0.8.0"



# With specific capabilities

ai-lib-rust = { version = "0.8.0", features = ["embeddings", "telemetry"] }



# Everything enabled

ai-lib-rust = { version = "0.8.0", features = ["full"] }

🗺️ Capability map (layered tools)

This is a structured view of what the crate provides, grouped by layers.

1) Protocol layer (src/protocol/)

  • ProtocolLoader: load provider manifests from local paths / env paths / GitHub raw URLs
  • ProtocolValidator: JSON Schema validation (supports offline via embedded schema)
  • ProtocolManifest: typed representation of provider manifests
  • UnifiedRequest: provider-agnostic request payload used by the runtime

2) Transport layer (src/transport/)

  • HttpTransport: reqwest-based transport with proxy/timeout defaults and env knobs
  • API key resolution: keyring → <PROVIDER_ID>_API_KEY env

3) Pipeline layer (src/pipeline/)

  • Operator pipeline: decoder → selector → accumulator → fanout → event mapper
  • Streaming normalization: maps provider frames to StreamingEvent

4) Client layer (src/client/)

  • AiClient: runtime entry point; model-driven ("provider/model")
  • Chat builder: client.chat().messages(...).stream().execute_stream()
  • Batch: chat_batch, chat_batch_smart
  • Observability: call_model_with_stats returns CallStats
  • Cancellation: execute_stream_with_cancel()CancelHandle
  • Services: EndpointExt for calling services declared in protocol manifests

5) Resilience layer (src/resilience/ + client/policy)

  • Policy engine: capability validation + retry/fallback decisions
  • Rate limiter: token-bucket + adaptive header-driven mode
  • Circuit breaker: minimal breaker with env or builder defaults
  • Backpressure: max in-flight permit gating

6) Types layer (src/types/)

  • Messages: Message, MessageRole, MessageContent, ContentBlock
  • Tools: ToolDefinition, FunctionDefinition, ToolCall
  • Events: StreamingEvent

7) Telemetry layer (src/telemetry/)

  • FeedbackSink / FeedbackEvent: opt-in feedback reporting
  • Extended feedback types: RatingFeedback, ThumbsFeedback, TextFeedback, CorrectionFeedback, RegenerateFeedback, StopFeedback
  • Multiple sinks: InMemoryFeedbackSink, ConsoleFeedbackSink, CompositeFeedbackSink
  • Global sink management: get_feedback_sink(), set_feedback_sink(), report_feedback()

8) Embedding layer (src/embeddings/) - NEW in v0.6.5

  • EmbeddingClient / EmbeddingClientBuilder: Generate embeddings from text
  • Types: Embedding, EmbeddingRequest, EmbeddingResponse, EmbeddingUsage
  • Vector operations: cosine_similarity, dot_product, euclidean_distance, manhattan_distance
  • Utilities: normalize_vector, average_vectors, weighted_average_vectors, find_most_similar

9) Cache layer (src/cache/) - NEW in v0.6.5

  • CacheBackend trait with MemoryCache and NullCache implementations
  • CacheManager: TTL-based caching with statistics
  • CacheKey / CacheKeyGenerator: Deterministic cache key generation

10) Token layer (src/tokens/) - NEW in v0.6.5

  • TokenCounter trait: CharacterEstimator, AnthropicEstimator, CachingCounter
  • ModelPricing: Pre-configured pricing for GPT-4o, Claude models
  • CostEstimate: Calculate request costs

11) Batch layer (src/batch/) - NEW in v0.6.5

  • BatchCollector / BatchConfig: Accumulate requests for batch processing
  • BatchExecutor: Execute batches with configurable strategies
  • BatchResult: Structured batch execution results

12) Plugin layer (src/plugins/) - NEW in v0.6.5

  • Plugin trait with lifecycle hooks
  • PluginRegistry: Centralized plugin management
  • Hook system: HookType, Hook, HookManager
  • Middleware: Middleware, MiddlewareChain for request/response transformation

13) Utils (src/utils/)

  • JSONPath mapping helpers, tool-call assembler, and small runtime utilities

14) Optional helpers (feature-gated)

  • routing_mvp (src/routing/): model selection + endpoint array load balancing (pure logic)
  • interceptors (src/interceptors/): hooks around calls for logging/metrics/audit

🚀 Quick Start

Sharing the client across tasks

AiClient does not implement Clone (by design, for API key and provider ToS compliance). Use Arc<AiClient> to share across async tasks:

use ai_lib_rust::{AiClient, Message};
use std::sync::Arc;

#[tokio::main]
async fn main() -> ai_lib_rust::Result<()> {
    let client = Arc::new(AiClient::new("openai/gpt-4o").await?);
    // Pass Arc::clone(&client) to spawned tasks
    let handle = tokio::spawn({
        let c = Arc::clone(&client);
        async move { c.chat().messages(vec![Message::user("Hi")]).execute().await }
    });
    let _ = handle.await?;
    Ok(())
}

Basic Usage

use ai_lib_rust::{AiClient, Message};
use ai_lib_rust::types::events::StreamingEvent;
use futures::StreamExt;

#[tokio::main]
async fn main() -> ai_lib_rust::Result<()> {
    // Create client directly using provider/model string
    // This is fully protocol-driven and supports any provider defined in ai-protocol manifests
    let client = AiClient::new("anthropic/claude-3-5-sonnet").await?;

    let messages = vec![Message::user("Hello!")];

    // Streaming (unified events)
    let mut stream = client
        .chat()
        .messages(messages)
        .temperature(0.7)
        .stream()
        .execute_stream()
        .await?;

    while let Some(event) = stream.next().await {
        match event? {
            StreamingEvent::PartialContentDelta { content, .. } => print!("{content}"),
            StreamingEvent::StreamEnd { .. } => break,
            _ => {}
        }
    }

    Ok(())
}

Multimodal (Image / Audio)

Multimodal inputs are represented as MessageContent::Blocks(Vec<ContentBlock>).

use ai_lib_rust::{Message, MessageRole};
use ai_lib_rust::types::message::{MessageContent, ContentBlock};

fn multimodal_message(image_path: &str) -> ai_lib_rust::Result<Message> {
    let blocks = vec![
        ContentBlock::text("Describe this image briefly."),
        ContentBlock::image_from_file(image_path)?,
    ];
    Ok(Message::with_content(
        MessageRole::User,
        MessageContent::blocks(blocks),
    ))
}

Useful environment variables

  • AI_PROTOCOL_DIR / AI_PROTOCOL_PATH: path to your local ai-protocol repo root (containing v1/)
  • AI_LIB_ATTEMPT_TIMEOUT_MS: per-attempt timeout guard used by the unified policy engine
  • AI_LIB_BATCH_CONCURRENCY: override concurrency limit for batch operations

Custom Protocol

use ai_lib_rust::protocol::ProtocolLoader;

let loader = ProtocolLoader::new()
    .with_base_path("./ai-protocol")
    .with_hot_reload(true);

let manifest = loader.load_provider("openai").await?;

📦 Installation

Add to your Cargo.toml:

[dependencies]

ai-lib-rust = "0.8.0"

tokio = { version = "1.0", features = ["full"] }

futures = "0.3"

🔧 Configuration

The library automatically looks for protocol manifests in the following locations (in order):

  1. Custom path set via ProtocolLoader::with_base_path()
  2. AI_PROTOCOL_DIR / AI_PROTOCOL_PATH (local path or GitHub raw URL)
  3. Common dev paths: ai-protocol/, ../ai-protocol/, ../../ai-protocol/
  4. Last resort: GitHub raw hiddenpath/ai-protocol (main)

For each base path, provider manifests are resolved in a backward-compatible order: dist/v1/providers/<id>.jsonv1/providers/<id>.yaml.

Protocol manifests should follow the AI-Protocol v1.5 specification structure. The runtime validates manifests against the official JSON Schema from the AI-Protocol repository.

🔐 Provider Requirements (API Keys)

Most providers require an API key. The runtime reads keys from (in order):

  1. OS Keyring (optional, convenience feature)

    • Windows: Uses Windows Credential Manager
    • macOS: Uses Keychain
    • Linux: Uses Secret Service API
    • Service: ai-protocol, Username: provider id
    • Note: Keyring is optional and may not work in containers/WSL. Falls back to environment variables automatically.
  2. Environment Variable (recommended for production)

    • Format: <PROVIDER_ID>_API_KEY (e.g. DEEPSEEK_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY)
    • Recommended for: CI/CD, containers, WSL, production deployments

Example:

# Set API key via environment variable (recommended)

export DEEPSEEK_API_KEY="sk-..."

export ANTHROPIC_API_KEY="sk-ant-..."


# Or use keyring (optional, for local development)

# Windows: Stored in Credential Manager

# macOS: Stored in Keychain

Provider-specific details vary, but ai-lib-rust normalizes them behind a unified client API.

🌐 Proxy / Timeout / Backpressure (Production knobs)

  • Proxy: set AI_PROXY_URL (e.g. http://user:pass@host:port)
  • HTTP timeout: set AI_HTTP_TIMEOUT_SECS (fallback: AI_TIMEOUT_SECS)
  • In-flight limit: set AI_LIB_MAX_INFLIGHT or use AiClientBuilder::max_inflight(n)
  • Rate limiting (optional): set either
    • AI_LIB_RPS (requests per second), or
    • AI_LIB_RPM (requests per minute)
  • Circuit breaker (optional): enable via AiClientBuilder::circuit_breaker_default() or env
    • AI_LIB_BREAKER_FAILURE_THRESHOLD (default 5)
    • AI_LIB_BREAKER_COOLDOWN_SECS (default 30)

📊 Observability: CallStats

If you need per-call stats (latency, retries, request ids, endpoint), use:

let (resp, stats) = client.call_model_with_stats(unified_req).await?;
println!("client_request_id={}", stats.client_request_id);

🛑 Cancellable Streaming

let (mut stream, cancel) = client.chat().messages(messages).stream().execute_stream_with_cancel().await?;
// cancel.cancel(); // emits StreamEnd{finish_reason:"cancelled"}, drops the underlying network stream, and releases inflight permit

🧾 Optional Feedback (Choice Selection)

Telemetry is opt-in. You can inject a FeedbackSink and report feedback explicitly:

use ai_lib_rust::telemetry::{FeedbackEvent, ChoiceSelectionFeedback};

client.report_feedback(FeedbackEvent::ChoiceSelection(ChoiceSelectionFeedback {
    request_id: stats.client_request_id.clone(),
    chosen_index: 0,
    rejected_indices: None,
    latency_to_select_ms: None,
    ui_context: None,
    candidate_hashes: None,
})).await?;

🎨 Key Features

Protocol-Driven Architecture

No match provider statements. All logic is derived from protocol configuration:

// The pipeline is built dynamically from protocol manifest
let pipeline = Pipeline::from_manifest(&manifest)?;

// Operators are configured via manifests (YAML/JSON), not hardcoded
// Adding a new provider requires zero code changes

Multi-Candidate Support

Automatically handles multi-candidate scenarios through the FanOut operator:

streaming:
  candidate:
    candidate_id_path: "$.choices[*].index"
    fan_out: true

Tool Accumulation

Stateful accumulation of tool call arguments:

streaming:
  accumulator:
    stateful_tool_parsing: true
    key_path: "$.delta.partial_json"
    flush_on: "$.type == 'content_block_stop'"

Hot Reload

Protocol configurations can be updated at runtime:

let loader = ProtocolLoader::new().with_hot_reload(true);
// Protocol changes are automatically picked up

📚 Examples

See the examples/ directory:

  • basic_usage.rs: Simple non-streaming chat completion
  • deepseek_chat_stream.rs: Streaming chat example
  • deepseek_tool_call_stream.rs: Tool calling with streaming
  • custom_protocol.rs: Loading custom protocol configurations
  • list_models.rs: Listing available models from provider
  • service_discovery.rs: Service discovery and custom service calls
  • test_protocol_loading.rs: Protocol loading sanity check

🧪 Testing

# Run all tests

cargo test


# Run compliance tests (cross-runtime consistency)

cargo test --test compliance


# Run with all features enabled

cargo test --features full

📦 Batch (Chat)

For batch execution (order-preserving), use:

use ai_lib_rust::{AiClient, ChatBatchRequest, Message};

let client = AiClient::new("deepseek/deepseek-chat").await?;

let reqs = vec![
    ChatBatchRequest::new(vec![Message::user("Hello")]),
    ChatBatchRequest::new(vec![Message::user("Explain SSE in one sentence")])
        .temperature(0.2),
];

let results = client.chat_batch(reqs, Some(5)).await;

Smart batch tuning

If you prefer a conservative default heuristic, use:

let results = client.chat_batch_smart(reqs).await;

Override concurrency with:

  • AI_LIB_BATCH_CONCURRENCY

🤝 Contributing

Contributions are welcome! Please ensure that:

  1. All protocol configurations follow the AI-Protocol specification (v1.5 / V2)
  2. New operators are properly documented
  3. Tests are included for new features
  4. Compliance tests pass for cross-runtime behaviors (cargo test --test compliance)
  5. Code follows Rust best practices and passes cargo clippy

📄 License

This project is licensed under either of:

at your option.

🔗 Related Projects


ai-lib-rust - Where protocol meets performance. 🚀