Crate car_inference

Expand description

§car-inference

Local model inference for the Common Agent Runtime.

Provides on-device inference using Candle with automatic hardware detection:

macOS: Metal (Apple Silicon GPU)
Linux: CUDA (NVIDIA GPU) or CPU fallback

Ships with Qwen3 models downloaded on first use from HuggingFace. Supports remote API models (OpenAI, Anthropic, Google) via the same schema.

§Architecture

Models are first-class typed resources described by ModelSchema (analogous to ToolSchema). The UnifiedRegistry holds local and remote models. The AdaptiveRouter selects the best model using a three-phase strategy: filter → score → explore. The OutcomeTracker learns from results to improve routing over time.

§Dual purpose

Internal — powers skill learning/repair, semantic memory, policy evaluation
Service — exposes infer, embed, classify as built-in CAR tools

Re-exports§

pub use adaptive_router::AdaptiveRouter;
pub use adaptive_router::AdaptiveRoutingDecision;
pub use adaptive_router::RoutingConfig;
pub use adaptive_router::RoutingStrategy;
pub use outcome::CodeOutcome;
pub use outcome::InferenceOutcome;
pub use outcome::InferenceTask;
pub use outcome::InferredOutcome;
pub use outcome::ModelProfile;
pub use outcome::OutcomeTracker;
pub use registry::ModelFilter;
pub use registry::ModelInfo;
pub use registry::UnifiedRegistry;
pub use remote::RemoteBackend;
pub use schema::ApiProtocol;
pub use schema::CostModel;
pub use schema::ModelCapability;
pub use schema::ModelSchema;
pub use schema::ModelSource;
pub use schema::PerformanceEnvelope;
pub use adaptive_router::TaskComplexity;
pub use backend::CandleBackend;
pub use backend::EmbeddingBackend;
pub use hardware::HardwareInfo;
pub use models::ModelRegistry;
pub use models::ModelRole;
pub use router::ModelRouter;
pub use router::RoutingDecision;
pub use tasks::ClassifyRequest;
pub use tasks::ClassifyResult;
pub use tasks::EmbedRequest;
pub use tasks::GenerateParams;
pub use tasks::GenerateRequest;

Modules§

adaptive_router: Adaptive model routing — three-phase routing with learned performance profiles.
backend
hardware: Hardware detection — auto-configure models and context based on system capabilities.
models: Model registry — tracks available Qwen3 models, handles download-on-first-use.
outcome: Outcome tracking — learn from inference results to improve routing.
registry: Unified model registry — local and remote models under one schema.
remote: Remote inference backend — HTTP client for cloud API models.
router: Intelligent model routing — select the best model based on prompt characteristics.
schema: Model schema — declarative metadata for models, analogous to ToolSchema for tools.
service: Inference service — exposes inference as built-in CAR tools.
tasks

Structs§

InferenceConfig: Configuration for the inference engine.
InferenceEngine: The main inference engine. Thread-safe, lazily loads models.
InferenceResult: Result of an inference call, including trace ID for outcome tracking.

Enums§

Device: Which device to run inference on.
InferenceError