car-inference
Local and remote model inference for the Common Agent Runtime.
What it does
Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models
downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google)
via the same typed ModelSchema interface. The AdaptiveRouter selects the best model using
a filter-score-explore strategy, learning from outcomes over time via OutcomeTracker.
Usage
use ;
let engine = new;
let result = engine.generate.await?;
Crate features
metal-- Apple Silicon GPU accelerationcuda-- NVIDIA GPU accelerationast-- AST-aware code generation viacar-ast
Part of CAR -- see the main repo for full documentation.