car-inference
Local model inference for the Common Agent Runtime.
Provides on-device inference using Candle with automatic hardware detection:
- macOS: Metal (Apple Silicon GPU)
- Linux: CUDA (NVIDIA GPU) or CPU fallback
Ships with Qwen3 models downloaded on first use from HuggingFace. Supports remote API models (OpenAI, Anthropic, Google) via the same schema.
Architecture
Models are first-class typed resources described by ModelSchema (analogous
to ToolSchema). The UnifiedRegistry holds local and remote models.
The AdaptiveRouter selects the best model using a three-phase strategy:
filter → score → explore. The OutcomeTracker learns from results to
improve routing over time.
Dual purpose
- Internal — powers skill learning/repair, semantic memory, policy evaluation
- Service — exposes
infer,embed,classifyas built-in CAR tools