Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
car-inference
Local and remote model inference for the Common Agent Runtime.
What it does
Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models
downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google)
via the same typed ModelSchema interface. The AdaptiveRouter selects the best model using
a filter-score-explore strategy, learning from outcomes over time via OutcomeTracker.
Usage
use ;
let engine = new;
let result = engine.generate.await?;
Apple FoundationModels backend (macOS 26+)
On Apple Silicon Macs running macOS 26 or later with Apple Intelligence provisioned, CAR can route inference to the system LLM through Apple's FoundationModels framework — no model file to download, no API key, no model weights. The OS owns everything.
The integration is a small Swift shim
(car-inference/swift/CarFoundationModels.swift) compiled by
build.rs and linked into the crate. The framework is weak-linked
so the produced binary still loads on pre-26 macOS; a runtime
availability check (is_available(), cached for 5 seconds) gates
calls.
// Routing happens automatically — request the apple/foundation:default
// model id, or let the AdaptiveRouter pick it for you.
let result = engine.generate.await?;
The catalog entry is tagged ["builtin", "local", "low_latency", "private"].
The adaptive router scores low_latency AND private together via
SYSTEM_LLM_BONUS (0.12) so system-owned models compete fairly with
MLX 4B for short fast-turn tasks (autocomplete, summarize, classify)
without claiming heavy reasoning workloads they can't serve.
What's wired:
- Single-shot text generation via
generate() - Token-by-token streaming via
stream()with prefix-diffing on Apple's cumulative snapshots —StreamEvent::Done.textcarries the full assembled output, matching Candle/MLX shape. - Tool calling via
generate_with_tools()— JSON-Schema tool definitions are converted to per-toolDynamicGenerationSchemas so the model sees real schemas. A capture-only SwiftToolrecords the first invocation and ends the turn; the call comes back as the standardToolCallshape the remote backends emit and the runtime executes it (CAR's propose/validate/execute contract). At most one tool call per turn — sequential tool use, no parallel calls; the catalog entry claimstool_usebut deliberately NOTmulti_tool_call, the router-readable "no parallel calls" signal. When a call is captured,textis empty — pre-call assistant prose is discarded. - Structured output via
generate_structured()—ResponseFormat::JsonSchemamaps onto FM's native constrained decoding (respond(to:schema:)).JsonObjecthas no native FM mode and falls back to instruction injection with a warning. - Graceful fallthrough on vision / audio / video — returns
InferenceError::UnsupportedMode(streaming included) so the router picks the next candidate instead of silently dropping capabilities.
What isn't:
- Multimodal input — the public FoundationModels API is text-only.
- Full JSON-Schema fidelity: the
DynamicGenerationSchemaconversion covers object/string/number/integer/boolean/array and string enums; typeless nodes (oneOf/anyOf/$ref), union types (["number","null"]), unrecognized types, and non-string enums degrade to permissive string fields (never to an empty object), and numeric enums keep the base type but lose the value constraint. Every degradation is detected on the Rust side and logged viatracing::warn!before crossing the FFI.
Build requirements:
- Full Xcode (not just Command Line Tools) on the build host —
needed for
xcrun swiftcand the FoundationModels SDK. - macOS 15+ deployment floor (overridable via
MACOSX_DEPLOYMENT_TARGET).
Linux and Intel-Mac builds skip the Swift compile entirely; the
ModelSource::AppleFoundationModels schema variant still serializes
(so registries can describe the model on any platform), but dispatch
errors out with UnsupportedMode before reaching the bridge.
Crate features
metal-- Apple Silicon GPU accelerationcuda-- NVIDIA GPU accelerationast-- AST-aware code generation viacar-ast
Part of CAR -- see the main repo for full documentation.