Expand description
Model Serving Ecosystem
Unified interface for local and remote model serving across the ML ecosystem.
§Components
ChatTemplateEngine- Unified prompt templating (Llama2, Mistral, ChatML)BackendSelector- Intelligent backend selection with privacy tiersCostCircuitBreaker- Daily budget limits to prevent runaway costsContextManager- Automatic token counting and truncationStatefulFailover- Streaming failover with context preservationSpilloverRouter- Hybrid cloud spillover routingLambdaDeployer- AWS Lambda inference deployment
§Toyota Way Principles
- Standardized Work: Chat templates ensure consistent model interaction
- Poka-Yoke: Privacy gates prevent accidental data leakage
- Jidoka: Stateful failover maintains context on errors
- Muda Elimination: Cost circuit breakers prevent waste
Re-exports§
pub use backends::BackendSelector;pub use backends::LatencyTier;pub use backends::PrivacyTier;pub use backends::ServingBackend;pub use circuit_breaker::CircuitBreakerConfig;pub use circuit_breaker::CostCircuitBreaker;pub use circuit_breaker::TokenPricing;pub use context::ContextManager;pub use context::ContextWindow;pub use context::TokenEstimator;pub use context::TruncationStrategy;pub use failover::FailoverConfig;pub use failover::FailoverManager;pub use failover::StreamingContext;pub use lambda::LambdaConfig;pub use lambda::LambdaDeployer;pub use lambda::LambdaRuntime;pub use router::RejectReason;pub use router::RouterConfig;pub use router::RoutingDecision;pub use router::SpilloverRouter;pub use templates::ChatMessage;pub use templates::ChatTemplateEngine;pub use templates::Role;pub use templates::TemplateFormat;