Expand description
Candelabra - a desktop-friendly wrapper around Candle for quantized GGUF models (LLaMA, Qwen, Phi, Gemma).
This crate provides:
- Async model downloads with progress reporting
- Automatic hardware detection with Metal/CUDA/CPU fallback
- Reusable model/tokenizer state for repeated inference runs
- A small, GUI-friendly API for token streaming and cancellation
§Scope
candelabra supports multi-architecture inference for quantized GGUFs
dynamically extracting the architecture string (llama, phi3, qwen2, etc)
to invoke the proper candle-transformers backend.
§Example
use candelabra::{download_model, load_tokenizer_from_repo, Model, InferenceConfig, run_inference};
use std::sync::{Arc, atomic::AtomicBool};
fn main() -> Result<(), Box<dyn std::error::Error>> {
let model_path = download_model(
"bartowski/SmolLM2-360M-Instruct-GGUF",
"SmolLM2-360M-Instruct-Q4_K_M.gguf",
)?;
let tokenizer = load_tokenizer_from_repo("HuggingFaceTB/SmolLM2-360M-Instruct")?;
let mut model = Model::load(&model_path)?;
let cancel_token = Arc::new(AtomicBool::new(false));
let config = InferenceConfig::default();
let _result = run_inference(
&mut model,
&tokenizer,
&config,
cancel_token,
|_| Ok(()),
)?;
Ok(())
}Structs§
- Download
Progress - Progress information emitted during model downloads.
- Inference
Config - Configuration for inference runs.
- Inference
Result - Result produced by a completed inference run.
- Model
- A loaded model ready for inference.
Enums§
- Candelabra
Error - Error type for all candelabra operations.
- Device
Type - The type of compute device available for inference.
Functions§
- check_
model_ cached - Checks if a model file is already present in the local Hugging Face cache.
- download_
model - Downloads a model file via
hf-huband returns the local cached path. - download_
model_ with_ channel - Downloads a model file with progress reporting via Tokio channel.
- download_
model_ with_ progress - Downloads a model file with progress reporting via callback.
- download_
tokenizer - Downloads a tokenizer file via
hf-huband returns the local cached path. - download_
tokenizer_ with_ channel - Downloads a tokenizer file with progress reporting via Tokio channel.
- download_
tokenizer_ with_ progress - Downloads a tokenizer file with progress reporting via callback.
- get_
best_ device - Returns the best available compute device with automatic fallback.
- get_
device - Returns the best available compute device with detailed error information.
- load_
tokenizer - Loads a tokenizer from disk.
- load_
tokenizer_ from_ repo - Downloads and loads a tokenizer from the Hugging Face cache.
- run_
inference - Runs inference using reusable model and tokenizer state.
- run_
inference_ with_ channel - Runs inference and streams tokens over a Tokio channel.