car-inference

Local model inference for the Common Agent Runtime.

Provides on-device inference using Candle with automatic hardware detection:

macOS: Metal (Apple Silicon GPU)
Linux: CUDA (NVIDIA GPU) or CPU fallback

Ships with Qwen3 models downloaded on first use from HuggingFace. Supports remote API models (OpenAI, Anthropic, Google) via the same schema.

Architecture

Models are first-class typed resources described by ModelSchema (analogous to ToolSchema). The UnifiedRegistry holds local and remote models. The AdaptiveRouter selects the best model using a three-phase strategy: filter → score → explore. The OutcomeTracker learns from results to improve routing over time.

Dual purpose

Internal — powers skill learning/repair, semantic memory, policy evaluation
Service — exposes infer, embed, classify as built-in CAR tools

car-inference 0.1.0

car-inference

Architecture

Dual purpose