car-inference 0.1.0

Local model inference for CAR — Candle backend with Qwen3 models
Documentation

car-inference

Local model inference for the Common Agent Runtime.

Provides on-device inference using Candle with automatic hardware detection:

  • macOS: Metal (Apple Silicon GPU)
  • Linux: CUDA (NVIDIA GPU) or CPU fallback

Ships with Qwen3 models downloaded on first use from HuggingFace. Supports remote API models (OpenAI, Anthropic, Google) via the same schema.

Architecture

Models are first-class typed resources described by ModelSchema (analogous to ToolSchema). The UnifiedRegistry holds local and remote models. The AdaptiveRouter selects the best model using a three-phase strategy: filter → score → explore. The OutcomeTracker learns from results to improve routing over time.

Dual purpose

  1. Internal — powers skill learning/repair, semantic memory, policy evaluation
  2. Service — exposes infer, embed, classify as built-in CAR tools