Skip to main content

Crate ironclad_llm

Crate ironclad_llm 

Source
Expand description

§ironclad-llm

LLM client pipeline for the Ironclad agent runtime. Requests flow through a multi-stage pipeline: cache check, routing (heuristic or ML), circuit breaker, dedup, format translation, prompt compression, tier adaptation, and HTTP forwarding.

§Key Types

  • LlmService – Top-level facade composing all pipeline stages
  • SemanticCache – 3-level cache (exact hash, tool TTL, semantic cosine)
  • ModelRouter – Runtime model selection and override control
  • LlmClient – HTTP/2 client pool with streaming support
  • EmbeddingClient – Multi-provider embedding client with n-gram fallback
  • SseChunkStream – SSE byte stream to parsed StreamChunk adapter

§Modules

  • cache – Semantic cache with HashMap + SQLite persistence
  • router – Heuristic model router (feature extraction, complexity scoring)
  • ml_router – Logistic regression backend + preference learning
  • tiered – Tiered inference with confidence evaluation and escalation
  • cascade – Cascade optimizer (cheapest-first, fallback chain)
  • circuit – Per-provider circuit breaker with exponential backoff
  • dedup – In-flight duplicate request detection
  • format – API format translation (OpenAI, Ollama, Google, Anthropic)
  • compression – Prompt compression and token estimation
  • tier – Tier-based prompt adaptation (T1 strip, T2 preamble, T3/T4 pass)
  • client – HTTP client pool, request forwarding, cost tracking
  • provider – Provider definitions and registry
  • embedding – Multi-provider embedding client
  • capacity – TPM/RPM sliding-window capacity tracking
  • accuracy – Per-model quality tracking
  • oauth – OAuth2 token management and refresh
  • transform – Request/response transform pipeline

Re-exports§

pub use accuracy::QualityTracker;
pub use cache::CachedResponse;
pub use cache::ExportedCacheEntry;
pub use cache::SemanticCache;
pub use capacity::CapacityTracker;
pub use cascade::CascadeOptimizer;
pub use cascade::CascadeOutcome;
pub use cascade::CascadeStrategy;
pub use circuit::CircuitBreakerRegistry;
pub use circuit::CircuitState;
pub use client::LlmClient;
pub use compression::CompressionEstimate;
pub use compression::PromptCompressor;
pub use dedup::DedupTracker;
pub use embedding::EmbeddingClient;
pub use embedding::EmbeddingConfig;
pub use ml_router::LogisticBackend;
pub use ml_router::PreferenceCollector;
pub use ml_router::PreferenceRecord;
pub use oauth::OAuthManager;
pub use profile::MetascoreBreakdown;
pub use profile::ModelProfile;
pub use profile::build_model_profiles;
pub use profile::select_by_metascore;
pub use provider::Provider;
pub use provider::ProviderRegistry;
pub use router::ModelRouter;
pub use router::classify_complexity;
pub use router::extract_features;
pub use tiered::ConfidenceEvaluator;
pub use tiered::EscalationTracker;
pub use tiered::InferenceTier;
pub use format::StreamChunk;

Modules§

accuracy
cache
capacity
cascade
circuit
client
compression
dedup
embedding
eval_harness
Offline routing evaluation harness for replaying historical decisions. Offline routing evaluation harness.
format
ml_router
oauth
profile
Per-model composite profiles and metascore computation.
provider
router
tier
tiered

Structs§

LlmService
SseChunkStream
A Stream adapter that converts raw SSE byte chunks from an LLM provider into parsed StreamChunk items. Handles buffering across chunk boundaries with proper incremental UTF-8 decoding.