Crate ironclad_llm

Expand description

§ironclad-llm

LLM client pipeline for the Ironclad agent runtime. Requests flow through a multi-stage pipeline: cache check, routing (heuristic or ML), circuit breaker, dedup, format translation, prompt compression, tier adaptation, and HTTP forwarding.

§Key Types

LlmService – Top-level facade composing all pipeline stages
SemanticCache – 3-level cache (exact hash, tool TTL, semantic cosine)
ModelRouter – Runtime model selection and override control
LlmClient – HTTP/2 client pool with streaming support
EmbeddingClient – Multi-provider embedding client with n-gram fallback
SseChunkStream – SSE byte stream to parsed StreamChunk adapter

§Modules

cache – Semantic cache with HashMap + SQLite persistence
router – Heuristic model router (feature extraction, complexity scoring)
ml_router – Logistic regression backend + preference learning
tiered – Tiered inference with confidence evaluation and escalation
cascade – Cascade optimizer (cheapest-first, fallback chain)
circuit – Per-provider circuit breaker with exponential backoff
dedup – In-flight duplicate request detection
format – API format translation (OpenAI, Ollama, Google, Anthropic)
compression – Prompt compression and token estimation
tier – Tier-based prompt adaptation (T1 strip, T2 preamble, T3/T4 pass)
client – HTTP client pool, request forwarding, cost tracking
provider – Provider definitions and registry
embedding – Multi-provider embedding client
capacity – TPM/RPM sliding-window capacity tracking
accuracy – Per-model quality tracking
oauth – OAuth2 token management and refresh
transform – Request/response transform pipeline

Re-exports§

pub use accuracy::QualityTracker;
pub use cache::CachedResponse;
pub use cache::ExportedCacheEntry;
pub use cache::SemanticCache;
pub use capacity::CapacityTracker;
pub use cascade::CascadeOptimizer;
pub use cascade::CascadeOutcome;
pub use cascade::CascadeStrategy;
pub use circuit::CircuitBreakerRegistry;
pub use circuit::CircuitState;
pub use client::LlmClient;
pub use compression::CompressionEstimate;
pub use compression::PromptCompressor;
pub use dedup::DedupTracker;
pub use embedding::EmbeddingClient;
pub use embedding::EmbeddingConfig;
pub use ml_router::LogisticBackend;
pub use ml_router::PreferenceCollector;
pub use ml_router::PreferenceRecord;
pub use oauth::OAuthManager;
pub use profile::MetascoreBreakdown;
pub use profile::ModelProfile;
pub use profile::build_model_profiles;
pub use profile::select_by_metascore;
pub use provider::Provider;
pub use provider::ProviderRegistry;
pub use router::ModelRouter;
pub use router::classify_complexity;
pub use router::extract_features;
pub use tiered::ConfidenceEvaluator;
pub use tiered::EscalationTracker;
pub use tiered::InferenceTier;
pub use format::StreamChunk;

Modules§

accuracy
cache
capacity
cascade
circuit
client
compression
dedup
embedding
eval_harness: Offline routing evaluation harness for replaying historical decisions. Offline routing evaluation harness.
format
ml_router
oauth
profile: Per-model composite profiles and metascore computation.
provider
router
tier
tiered

Structs§

LlmService
SseChunkStream: A Stream adapter that converts raw SSE byte chunks from an LLM provider into parsed StreamChunk items. Handles buffering across chunk boundaries with proper incremental UTF-8 decoding.

Crate ironclad_llm

Crate ironclad_llm Copy item path

§ironclad-llm

§Key Types

§Modules

Re-exports§

Modules§

Structs§

Crate ironclad_llm