Skip to main content

Crate candelabra

Crate candelabra 

Source
Expand description

Candelabra - a desktop-friendly wrapper around Candle for quantized GGUF models (LLaMA, Qwen, Phi, Gemma).

This crate provides:

  • Async model downloads with progress reporting
  • Automatic hardware detection with Metal/CUDA/CPU fallback
  • Reusable model/tokenizer state for repeated inference runs
  • A small, GUI-friendly API for token streaming and cancellation

§Scope

candelabra supports multi-architecture inference for quantized GGUFs dynamically extracting the architecture string (llama, phi3, qwen2, etc) to invoke the proper candle-transformers backend.

§Example

use candelabra::{download_model, load_tokenizer_from_repo, Model, InferenceConfig, run_inference};
use std::sync::{Arc, atomic::AtomicBool};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let model_path = download_model(
        "bartowski/SmolLM2-360M-Instruct-GGUF",
        "SmolLM2-360M-Instruct-Q4_K_M.gguf",
    )?;
    let tokenizer = load_tokenizer_from_repo("HuggingFaceTB/SmolLM2-360M-Instruct")?;
    let mut model = Model::load(&model_path)?;
    let cancel_token = Arc::new(AtomicBool::new(false));
    let config = InferenceConfig::default();

    let _result = run_inference(
        &mut model,
        &tokenizer,
        &config,
        cancel_token,
        |_| Ok(()),
    )?;

    Ok(())
}

Structs§

DownloadProgress
Progress information emitted during model downloads.
InferenceConfig
Configuration for inference runs.
InferenceResult
Result produced by a completed inference run.
Model
A loaded model ready for inference.

Enums§

CandelabraError
Error type for all candelabra operations.
DeviceType
The type of compute device available for inference.

Functions§

check_model_cached
Checks if a model file is already present in the local Hugging Face cache.
download_model
Downloads a model file via hf-hub and returns the local cached path.
download_model_with_channel
Downloads a model file with progress reporting via Tokio channel.
download_model_with_progress
Downloads a model file with progress reporting via callback.
download_tokenizer
Downloads a tokenizer file via hf-hub and returns the local cached path.
download_tokenizer_with_channel
Downloads a tokenizer file with progress reporting via Tokio channel.
download_tokenizer_with_progress
Downloads a tokenizer file with progress reporting via callback.
get_best_device
Returns the best available compute device with automatic fallback.
get_device
Returns the best available compute device with detailed error information.
load_tokenizer
Loads a tokenizer from disk.
load_tokenizer_from_repo
Downloads and loads a tokenizer from the Hugging Face cache.
run_inference
Runs inference using reusable model and tokenizer state.
run_inference_with_channel
Runs inference and streams tokens over a Tokio channel.

Type Aliases§

Result