car-inference 0.32.1

Local model inference for CAR — Candle backend with Qwen3 models
docs.rs failed to build car-inference-0.32.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: car-inference-0.23.0

car-inference

Local and remote model inference for the Common Agent Runtime.

What it does

Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google) via the same typed ModelSchema interface. The AdaptiveRouter selects the best model using a filter-score-explore strategy, learning from outcomes over time via OutcomeTracker.

Usage

use car_inference::{InferenceEngine, InferenceConfig, GenerateRequest, GenerateParams};

let engine = InferenceEngine::new(InferenceConfig::default());
let result = engine.generate(GenerateRequest {
    prompt: "Explain quicksort".into(),
    params: GenerateParams::default(),
    ..Default::default()
}).await?;

Apple FoundationModels backend (macOS 26+)

On Apple Silicon Macs running macOS 26 or later with Apple Intelligence provisioned, CAR can route inference to the system LLM through Apple's FoundationModels framework — no model file to download, no API key, no model weights. The OS owns everything.

The integration is a small Swift shim (car-inference/swift/CarFoundationModels.swift) compiled by build.rs and linked into the crate. The framework is weak-linked so the produced binary still loads on pre-26 macOS; a runtime availability check (is_available(), cached for 5 seconds) gates calls.

// Routing happens automatically — request the apple/foundation:default
// model id, or let the AdaptiveRouter pick it for you.
let result = engine.generate(GenerateRequest {
    prompt: "summarize this in one sentence: ...".into(),
    model_id: Some("apple/foundation:default".into()),
    ..Default::default()
}).await?;

The catalog entry is tagged ["builtin", "local", "low_latency", "private"]. The adaptive router scores low_latency AND private together via SYSTEM_LLM_BONUS (0.12) so system-owned models compete fairly with MLX 4B for short fast-turn tasks (autocomplete, summarize, classify) without claiming heavy reasoning workloads they can't serve.

What's wired:

  • Single-shot text generation via generate()
  • Token-by-token streaming via stream() with prefix-diffing on Apple's cumulative snapshots — StreamEvent::Done.text carries the full assembled output, matching Candle/MLX shape.
  • Tool calling via generate_with_tools() — JSON-Schema tool definitions are converted to per-tool DynamicGenerationSchemas so the model sees real schemas. A capture-only Swift Tool records the first invocation and ends the turn; the call comes back as the standard ToolCall shape the remote backends emit and the runtime executes it (CAR's propose/validate/execute contract). At most one tool call per turn — sequential tool use, no parallel calls; the catalog entry claims tool_use but deliberately NOT multi_tool_call, the router-readable "no parallel calls" signal. When a call is captured, text is empty — pre-call assistant prose is discarded.
  • Structured output via generate_structured()ResponseFormat::JsonSchema maps onto FM's native constrained decoding (respond(to:schema:)). JsonObject has no native FM mode and falls back to instruction injection with a warning.
  • Graceful fallthrough on vision / audio / video — returns InferenceError::UnsupportedMode (streaming included) so the router picks the next candidate instead of silently dropping capabilities.

What isn't:

  • Multimodal input — the public FoundationModels API is text-only.
  • Full JSON-Schema fidelity: the DynamicGenerationSchema conversion covers object/string/number/integer/boolean/array and string enums; typeless nodes (oneOf/anyOf/$ref), union types (["number","null"]), unrecognized types, and non-string enums degrade to permissive string fields (never to an empty object), and numeric enums keep the base type but lose the value constraint. Every degradation is detected on the Rust side and logged via tracing::warn! before crossing the FFI.

Build requirements:

  • Full Xcode (not just Command Line Tools) on the build host — needed for xcrun swiftc and the FoundationModels SDK.
  • macOS 15+ deployment floor (overridable via MACOSX_DEPLOYMENT_TARGET).

Linux and Intel-Mac builds skip the Swift compile entirely; the ModelSource::AppleFoundationModels schema variant still serializes (so registries can describe the model on any platform), but dispatch errors out with UnsupportedMode before reaching the bridge.

Crate features

  • metal -- Apple Silicon GPU acceleration
  • cuda -- NVIDIA GPU acceleration
  • ast -- AST-aware code generation via car-ast

Part of CAR -- see the main repo for full documentation.