car-inference
Local and remote model inference for the Common Agent Runtime.
What it does
Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models
downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google)
via the same typed ModelSchema interface. The AdaptiveRouter selects the best model using
a filter-score-explore strategy, learning from outcomes over time via OutcomeTracker.
Usage
use ;
let engine = new;
let result = engine.generate.await?;
Apple FoundationModels backend (macOS 26+)
On Apple Silicon Macs running macOS 26 or later with Apple Intelligence provisioned, CAR can route inference to the system LLM through Apple's FoundationModels framework — no model file to download, no API key, no model weights. The OS owns everything.
The integration is a small Swift shim
(car-inference/swift/CarFoundationModels.swift) compiled by
build.rs and linked into the crate. The framework is weak-linked
so the produced binary still loads on pre-26 macOS; a runtime
availability check (is_available(), cached for 5 seconds) gates
calls.
// Routing happens automatically — request the apple/foundation:default
// model id, or let the AdaptiveRouter pick it for you.
let result = engine.generate.await?;
The catalog entry is tagged ["builtin", "local", "low_latency", "private"].
The adaptive router scores low_latency AND private together via
SYSTEM_LLM_BONUS (0.12) so system-owned models compete fairly with
MLX 4B for short fast-turn tasks (autocomplete, summarize, classify)
without claiming heavy reasoning workloads they can't serve.
What's wired in v1:
- Single-shot text generation via
generate() - Token-by-token streaming via
stream()with prefix-diffing on Apple's cumulative snapshots —StreamEvent::Done.textcarries the full assembled output, matching Candle/MLX shape. - Graceful fallthrough on tools / vision / audio / video — returns
InferenceError::UnsupportedModeso the router picks the next candidate instead of silently dropping capabilities.
What isn't:
- Tool calling. The
Toolprotocol takesArguments: Generable(a Swift-static, macro-derived protocol); bridging dynamic JSON schemas requires eitherDynamicGenerationSchema(macOS 26-only) or a single-dispatch shim. Design notes are inbackend/foundation_models.rs's module docs.
Build requirements:
- Full Xcode (not just Command Line Tools) on the build host —
needed for
xcrun swiftcand the FoundationModels SDK. - macOS 15+ deployment floor (overridable via
MACOSX_DEPLOYMENT_TARGET).
Linux and Intel-Mac builds skip the Swift compile entirely; the
ModelSource::AppleFoundationModels schema variant still serializes
(so registries can describe the model on any platform), but dispatch
errors out with UnsupportedMode before reaching the bridge.
Crate features
metal-- Apple Silicon GPU accelerationcuda-- NVIDIA GPU accelerationast-- AST-aware code generation viacar-ast
Part of CAR -- see the main repo for full documentation.