car-inference 0.1.1

Local model inference for CAR — Candle backend with Qwen3 models
Documentation

car-inference

Local and remote model inference for the Common Agent Runtime.

What it does

Provides on-device inference using Candle (Metal on macOS, CUDA on Linux) with Qwen3 models downloaded from HuggingFace on first use. Also supports remote APIs (OpenAI, Anthropic, Google) via the same typed ModelSchema interface. The AdaptiveRouter selects the best model using a filter-score-explore strategy, learning from outcomes over time via OutcomeTracker.

Usage

use car_inference::{InferenceEngine, InferenceConfig, GenerateRequest, GenerateParams};

let engine = InferenceEngine::new(InferenceConfig::default());
let result = engine.generate(GenerateRequest {
    prompt: "Explain quicksort".into(),
    params: GenerateParams::default(),
    ..Default::default()
}).await?;

Crate features

  • metal -- Apple Silicon GPU acceleration
  • cuda -- NVIDIA GPU acceleration
  • ast -- AST-aware code generation via car-ast

Part of CAR -- see the main repo for full documentation.