# car-inference
Local and remote model inference for the [Common Agent Runtime](https://github.com/Parslee-ai/car).
## What it does
Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models
downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google)
via the same typed `ModelSchema` interface. The `AdaptiveRouter` selects the best model using
a filter-score-explore strategy, learning from outcomes over time via `OutcomeTracker`.
## Usage
```rust
use car_inference::{InferenceEngine, InferenceConfig, GenerateRequest, GenerateParams};
let engine = InferenceEngine::new(InferenceConfig::default());
let result = engine.generate(GenerateRequest {
prompt: "Explain quicksort".into(),
params: GenerateParams::default(),
..Default::default()
}).await?;
```
## Apple FoundationModels backend (macOS 26+)
On Apple Silicon Macs running macOS 26 or later with Apple Intelligence
provisioned, CAR can route inference to the **system LLM** through
Apple's FoundationModels framework — no model file to download, no API
key, no model weights. The OS owns everything.
The integration is a small Swift shim
(`car-inference/swift/CarFoundationModels.swift`) compiled by
`build.rs` and linked into the crate. The framework is **weak-linked**
so the produced binary still loads on pre-26 macOS; a runtime
availability check (`is_available()`, cached for 5 seconds) gates
calls.
```rust
// Routing happens automatically — request the apple/foundation:default
// model id, or let the AdaptiveRouter pick it for you.
let result = engine.generate(GenerateRequest {
prompt: "summarize this in one sentence: ...".into(),
model_id: Some("apple/foundation:default".into()),
..Default::default()
}).await?;
```
The catalog entry is tagged `["builtin", "local", "low_latency", "private"]`.
The adaptive router scores `low_latency` AND `private` together via
`SYSTEM_LLM_BONUS` (0.12) so system-owned models compete fairly with
MLX 4B for short fast-turn tasks (autocomplete, summarize, classify)
without claiming heavy reasoning workloads they can't serve.
**What's wired in v1:**
- Single-shot text generation via `generate()`
- Token-by-token streaming via `stream()` with prefix-diffing on
Apple's cumulative snapshots — `StreamEvent::Done.text` carries the
full assembled output, matching Candle/MLX shape.
- Graceful fallthrough on tools / vision / audio / video — returns
`InferenceError::UnsupportedMode` so the router picks the next
candidate instead of silently dropping capabilities.
**What isn't:**
- Tool calling. The `Tool` protocol takes `Arguments: Generable` (a
Swift-static, macro-derived protocol); bridging dynamic JSON
schemas requires either `DynamicGenerationSchema` (macOS 26-only)
or a single-dispatch shim. Design notes are in
`backend/foundation_models.rs`'s module docs.
**Build requirements:**
- Full Xcode (not just Command Line Tools) on the build host —
needed for `xcrun swiftc` and the FoundationModels SDK.
- macOS 15+ deployment floor (overridable via `MACOSX_DEPLOYMENT_TARGET`).
Linux and Intel-Mac builds skip the Swift compile entirely; the
`ModelSource::AppleFoundationModels` schema variant still serializes
(so registries can describe the model on any platform), but dispatch
errors out with `UnsupportedMode` before reaching the bridge.
## Crate features
- `metal` -- Apple Silicon GPU acceleration
- `cuda` -- NVIDIA GPU acceleration
- `ast` -- AST-aware code generation via `car-ast`
Part of [CAR](https://github.com/Parslee-ai/car) -- see the main repo for full documentation.