Expand description
§ironclad-llm
LLM client pipeline for the Ironclad agent runtime. Requests flow through a multi-stage pipeline: cache check, routing (heuristic or ML), circuit breaker, dedup, format translation, prompt compression, tier adaptation, and HTTP forwarding.
§Key Types
LlmService– Top-level facade composing all pipeline stagesSemanticCache– 3-level cache (exact hash, tool TTL, semantic cosine)ModelRouter– Runtime model selection and override controlLlmClient– HTTP/2 client pool with streaming supportEmbeddingClient– Multi-provider embedding client with n-gram fallbackSseChunkStream– SSE byte stream to parsedStreamChunkadapter
§Modules
cache– Semantic cache with HashMap + SQLite persistencerouter– Heuristic model router (feature extraction, complexity scoring)ml_router– Logistic regression backend + preference learningtiered– Tiered inference with confidence evaluation and escalationcascade– Cascade optimizer (cheapest-first, fallback chain)circuit– Per-provider circuit breaker with exponential backoffdedup– In-flight duplicate request detectionformat– API format translation (OpenAI, Ollama, Google, Anthropic)compression– Prompt compression and token estimationtier– Tier-based prompt adaptation (T1 strip, T2 preamble, T3/T4 pass)client– HTTP client pool, request forwarding, cost trackingprovider– Provider definitions and registryembedding– Multi-provider embedding clientcapacity– TPM/RPM sliding-window capacity trackingaccuracy– Per-model quality trackingoauth– OAuth2 token management and refreshtransform– Request/response transform pipeline
Re-exports§
pub use accuracy::QualityTracker;pub use cache::CachedResponse;pub use cache::ExportedCacheEntry;pub use cache::SemanticCache;pub use capacity::CapacityTracker;pub use cascade::CascadeOptimizer;pub use cascade::CascadeOutcome;pub use cascade::CascadeStrategy;pub use circuit::CircuitBreakerRegistry;pub use circuit::CircuitState;pub use client::LlmClient;pub use compression::CompressionEstimate;pub use compression::PromptCompressor;pub use dedup::DedupTracker;pub use embedding::EmbeddingClient;pub use embedding::EmbeddingConfig;pub use ml_router::LogisticBackend;pub use ml_router::PreferenceCollector;pub use ml_router::PreferenceRecord;pub use oauth::OAuthManager;pub use profile::MetascoreBreakdown;pub use profile::ModelProfile;pub use profile::build_model_profiles;pub use profile::select_by_metascore;pub use provider::Provider;pub use provider::ProviderRegistry;pub use router::ModelRouter;pub use router::classify_complexity;pub use router::extract_features;pub use tiered::ConfidenceEvaluator;pub use tiered::EscalationTracker;pub use tiered::InferenceTier;pub use format::StreamChunk;
Modules§
- accuracy
- cache
- capacity
- cascade
- circuit
- client
- compression
- dedup
- embedding
- eval_
harness - Offline routing evaluation harness for replaying historical decisions. Offline routing evaluation harness.
- format
- ml_
router - oauth
- profile
- Per-model composite profiles and metascore computation.
- provider
- router
- tier
- tiered
Structs§
- LlmService
- SseChunk
Stream - A
Streamadapter that converts raw SSE byte chunks from an LLM provider into parsedStreamChunkitems. Handles buffering across chunk boundaries with proper incremental UTF-8 decoding.