Module models

Expand description

Candle implementations for various deep learning models

This crate provides implementations of popular machine learning models and architectures for different modalities.

Large language models: llama, phi3, mamba, mixtral, bert, …
Text to text models: t5, …
Image to text models: blip, …
Text to image models: stable_diffusion and wuerstchen, …
Audio models: whisper, encodec, metavoice, parler_tts, …
Computer vision models: dinov2, convmixer, efficientnet, …

Some of the models also have quantized variants, e.g. quantized_blip, quantized_llama and quantized_qwen2.

The implementations aim to be readable while maintaining good performance. For more information on each model see the model’s module docs in the links below.

Modules§

based: Based from the Stanford Hazy Research group.
beit: Based on the BEIT vision-language model.
bert: BERT (Bidirectional Encoder Representations from Transformers)
bigcode: BigCode implementation in Rust based on the GPT-BigCode model.
blip: Based on the BLIP paper from Salesforce Research.
blip_text: Implementation of BLIP text encoder/decoder.
chatglm: Implementation of the ChatGLM2/3 models from THUDM.
chinese_clip: Chinese contrastive Language-Image Pre-Training
clip: Contrastive Language-Image Pre-Training
codegeex4_9b: CodeGeeX4 - A multi-language code generation model
colpali: Colpali Model for text/image similarity scoring.
convmixer: ConvMixer implementation.
convnext: ConvNeXt implementation.
csm: Implementation of the Conversational Speech Model (CSM) from Sesame
dac: Implementation of the Descript Audio Codec (DAC) model
debertav2
deepseek2
depth_anything_v2: Implementation of the Depth Anything model from FAIR.
dinov2: Implementation of the DINOv2 models from Meta Research.
dinov2reg4: Implementation of the DINOv2 revision (4 regularization)
distilbert: Implementation of DistilBert, a distilled version of BERT.
efficientnet: Implementation of EfficientBert, an efficient variant of BERT for computer vision tasks.
efficientvit: EfficientViT (MSRA) inference implementation based on timm.
encodec: EnCodec neural audio codec based on the Encodec implementation.
eva2: EVA-2 inference implementation.
falcon: Falcon language model inference implementation
fastvit: FastViT inference implementation based on timm
flux: Flux Model
gemma: Gemma inference implementation.
gemma2: Gemma LLM architecture (Google) inference implementation.
gemma3: Gemma LLM architecture (Google) inference implementation.
glm4: GLM-4 inference implementation.
glm4_new
granite: Granite is a Long Context Transformer Language Model.
granitemoehybrid: GraniteMoeHybrid is a Long Context Transformer Language Model.
helium: Helium inference implementation.
hiera: Hiera inference implementation based on timm.
jina_bert: JinaBERT inference implementation
llama: Llama inference implementation.
llama2_c: Llama2 inference implementation.
llama2_c_weights: Llama2 inference implementation.
llava: The LLaVA (Large Language and Vision Assistant) model.
mamba: Mamba inference implementation.
mamba2: Mamba2 inference implementation.
marian: Marian Neural Machine Translation
metavoice: MetaVoice Studio ML Models
mimi: mimi model
mistral: Mixtral Model, based on the Mistral architecture
mixformer: MixFormer (Microsoft’s Phi Architecture)
mixtral: Mixtral Model, a sparse mixture of expert model based on the Mistral architecture
mmdit: Mix of Multi-scale Dilated and Traditional Convolutions
mobileclip: Mobile CLIP model, combining a lightweight vision encoder with a text encoder
mobilenetv4: MobileNet-v4
mobileone: MobileOne
modernbert: ModernBERT
moondream: MoonDream Model vision-to-text
mpt: Module implementing the MPT (Multi-Purpose Transformer) model
nomic_bert: NomicBERT
nvembed_v2: NV-Embed-v2
olmo: OLMo (Open Language Model) implementation
olmo2: OLMo 2 (Open Language Model) implementation
openclip: Open Contrastive Language-Image Pre-Training
paddleocr_vl: PaddleOCR-VL Vision-Language Model for OCR.
paligemma: Multimodal multi-purpose model combining Gemma-based language model with SigLIP image understanding
parler_tts: Parler Model implementation for parler_tts text-to-speech synthesis
persimmon: Persimmon Model
phi: Microsoft Phi model implementation
phi3: Microsoft Phi-3 model implementation
pixtral: Pixtral Language-Image Pre-Training
quantized_blip: BLIP model implementation with quantization support.
quantized_blip_text: Quantized BLIP text module implementation.
quantized_gemma3: Gemma 3 model implementation with quantization support.
quantized_glm4: GLM4 implementation with quantization support.
quantized_lfm2
quantized_llama: Quantized llama model implementation.
quantized_llama2_c: Quantized Llama2 model implementation.
quantized_metavoice: Quantized MetaVoice model implementation.
quantized_mistral: Mistral model implementation with quantization support.
quantized_mixformer: Module containing quantized MixFormer model implementation.
quantized_moondream: Implementation of a quantized Moondream vision language model.
quantized_mpt: Quantized MPT model implementation.
quantized_phi: Phi2 model implementation with quantization support.
quantized_phi3: Phi3 model implementation with quantization support.
quantized_qwen2: Qwen2 model implementation with quantization support.
quantized_qwen3: Qwen3 implementation with quantization support.
quantized_qwen3_moe
quantized_recurrent_gemma: Recurrent Gemma model implementation with quantization support.
quantized_rwkv_v5: RWKV v5 model implementation with quantization support.
quantized_rwkv_v6: RWKV v6 model implementation with quantization support.
quantized_stable_lm: Module for quantized StableLM implementation.
quantized_t5: T5 model implementation with quantization support.
qwen2: Qwen2 model implementation with quantization support.
qwen3
qwen2_moe: Qwen2 model implementation with Mixture of Experts support.
qwen3_moe
qwen3_vl
recurrent_gemma: Recurrent Gemma model implementation
repvgg: RepVGG inference implementation
resnet: ResNet Implementation
rwkv_v5: RWKV v5 model implementation.
rwkv_v6: RWKV v6 model implementation.
rwkv_v7: RWKV v7 “Goose” (x070) model implementation.
segformer: Segformer model implementation for semantic segmentation and image classification.
segment_anything: Segment Anything Model (SAM)
siglip: Siglip model implementation.
smol: SmolLM model family implementations.
snac: Implementation of the Multi-Scale Neural Audio Codec (SNAC)
stable_diffusion: Stable Diffusion
stable_lm: StableLM model implementation.
starcoder2: StarCoder model implementation with quantization support.
stella_en_v5: Stella v5 model implementation.
t5: T5 model implementation.
trocr: TrOCR model implementation.
vgg: VGG-16 model implementation.
vit: Vision Transformer (ViT) implementation.
voxtral
whisper: Whisper Model Implementation
with_tracing
wuerstchen: Würstchen Efficient Diffusion Model
xlm_roberta
yi: Yi model implementation.
z_image: Z-Image Model

Module models

Module models Copy item path

Modules§

Module models