candelabra
candelabra is a small Rust crate for desktop applications that want to run
quantized GGUF models (LLaMA, Qwen, Phi, Gemma, etc.) with
candle-core,
candle-transformers, and
hf-hub.
It focuses on the pieces GUI apps usually need:
- Hugging Face downloads that respect the local
hf-hubcache - tokenizer loading helpers
- automatic Metal or CUDA fallback to CPU
- reusable loaded model state
- token streaming with cancellation support
Current Scope
candelabra natively supports quantized GGUF checkpoints with dynamic architecture detection.
Supported architectures include:
llama/mistral/mixtral/gemma/gemma2phi3qwen2(Qwen 2, Qwen 2.5, QwQ)qwen3/qwen3moegemma3glm4
That means the crate is a good fit if you want a lightweight Rust API for local
desktop inference on models such as Qwen 2.5 or SmolLM GGUF variants.
It abstracts away the candle_transformers::models paths into a single unified Model block.
Installation
Add the crate to your Cargo.toml:
[]
= "0.1"
Quick Start
use ;
use ;
Main API
download_model()downloads a model file through the local Hugging Face cache.download_model_with_progress()anddownload_model_with_channel()emit progress updates suitable for UI progress bars.load_tokenizer_from_repo()downloads and loadstokenizer.json.Model::load()loads a quantized GGUF model onto the best available device, dynamically instantiating the correct candle architecture base on metadata.run_inference()streams generated tokens through a callback.run_inference_with_channel()streams generated tokens over a Tokio channel.
Platform Notes
- On macOS, the crate prefers Metal and falls back to CPU.
- On non-macOS platforms, the crate prefers CUDA and falls back to CPU.
- The public
device_usedstring is intended to be easy to surface directly in desktop UIs.
License
Licensed under either of these, at your option:
- Apache License, Version 2.0
- MIT license