Expand description
Candle implementations for various deep learning models
This crate provides implementations of popular machine learning models and architectures for different modalities.
- Large language models:
llama,phi3,mamba,mixtral,bert, … - Text to text models:
t5, … - Image to text models:
blip, … - Text to image models:
stable_diffusionandwuerstchen, … - Audio models:
whisper,encodec,metavoice,parler_tts, … - Computer vision models:
dinov2,convmixer,efficientnet, …
Some of the models also have quantized variants, e.g. quantized_blip, quantized_llama and quantized_qwen2.
The implementations aim to be readable while maintaining good performance. For more information on each model see the model’s module docs in the links below.
Modules§
- based
- Based from the Stanford Hazy Research group.
- beit
- Based on the BEIT vision-language model.
- bert
- BERT (Bidirectional Encoder Representations from Transformers)
- bigcode
- BigCode implementation in Rust based on the GPT-BigCode model.
- blip
- Based on the BLIP paper from Salesforce Research.
- blip_
text - Implementation of BLIP text encoder/decoder.
- chatglm
- Implementation of the ChatGLM2/3 models from THUDM.
- chinese_
clip - Chinese contrastive Language-Image Pre-Training
- clip
- Contrastive Language-Image Pre-Training
- codegeex4_
9b - CodeGeeX4 - A multi-language code generation model
- colpali
- Colpali Model for text/image similarity scoring.
- convmixer
- ConvMixer implementation.
- convnext
- ConvNeXt implementation.
- csm
- Implementation of the Conversational Speech Model (CSM) from Sesame
- dac
- Implementation of the Descript Audio Codec (DAC) model
- debertav2
- deepseek2
- depth_
anything_ v2 - Implementation of the Depth Anything model from FAIR.
- dinov2
- Implementation of the DINOv2 models from Meta Research.
- dinov2reg4
- Implementation of the DINOv2 revision (4 regularization)
- distilbert
- Implementation of DistilBert, a distilled version of BERT.
- efficientnet
- Implementation of EfficientBert, an efficient variant of BERT for computer vision tasks.
- efficientvit
- EfficientViT (MSRA) inference implementation based on timm.
- encodec
- EnCodec neural audio codec based on the Encodec implementation.
- eva2
- EVA-2 inference implementation.
- falcon
- Falcon language model inference implementation
- fastvit
- FastViT inference implementation based on timm
- flux
- Flux Model
- gemma
- Gemma inference implementation.
- gemma2
- Gemma LLM architecture (Google) inference implementation.
- gemma3
- Gemma LLM architecture (Google) inference implementation.
- glm4
- GLM-4 inference implementation.
- glm4_
new - granite
- Granite is a Long Context Transformer Language Model.
- granitemoehybrid
- GraniteMoeHybrid is a Long Context Transformer Language Model.
- helium
- Helium inference implementation.
- hiera
- Hiera inference implementation based on timm.
- jina_
bert - JinaBERT inference implementation
- llama
- Llama inference implementation.
- llama2_
c - Llama2 inference implementation.
- llama2_
c_ weights - Llama2 inference implementation.
- llava
- The LLaVA (Large Language and Vision Assistant) model.
- mamba
- Mamba inference implementation.
- mamba2
- Mamba2 inference implementation.
- marian
- Marian Neural Machine Translation
- metavoice
- MetaVoice Studio ML Models
- mimi
- mimi model
- mistral
- Mixtral Model, based on the Mistral architecture
- mixformer
- MixFormer (Microsoft’s Phi Architecture)
- mixtral
- Mixtral Model, a sparse mixture of expert model based on the Mistral architecture
- mmdit
- Mix of Multi-scale Dilated and Traditional Convolutions
- mobileclip
- Mobile CLIP model, combining a lightweight vision encoder with a text encoder
- mobilenetv4
- MobileNet-v4
- mobileone
- MobileOne
- modernbert
- ModernBERT
- moondream
- MoonDream Model vision-to-text
- mpt
- Module implementing the MPT (Multi-Purpose Transformer) model
- nomic_
bert - NomicBERT
- nvembed_
v2 - NV-Embed-v2
- olmo
- OLMo (Open Language Model) implementation
- olmo2
- OLMo 2 (Open Language Model) implementation
- openclip
- Open Contrastive Language-Image Pre-Training
- paddleocr_
vl - PaddleOCR-VL Vision-Language Model for OCR.
- paligemma
- Multimodal multi-purpose model combining Gemma-based language model with SigLIP image understanding
- parler_
tts - Parler Model implementation for parler_tts text-to-speech synthesis
- persimmon
- Persimmon Model
- phi
- Microsoft Phi model implementation
- phi3
- Microsoft Phi-3 model implementation
- pixtral
- Pixtral Language-Image Pre-Training
- quantized_
blip - BLIP model implementation with quantization support.
- quantized_
blip_ text - Quantized BLIP text module implementation.
- quantized_
gemma3 - Gemma 3 model implementation with quantization support.
- quantized_
glm4 - GLM4 implementation with quantization support.
- quantized_
lfm2 - quantized_
llama - Quantized llama model implementation.
- quantized_
llama2_ c - Quantized Llama2 model implementation.
- quantized_
metavoice - Quantized MetaVoice model implementation.
- quantized_
mistral - Mistral model implementation with quantization support.
- quantized_
mixformer - Module containing quantized MixFormer model implementation.
- quantized_
moondream - Implementation of a quantized Moondream vision language model.
- quantized_
mpt - Quantized MPT model implementation.
- quantized_
phi - Phi2 model implementation with quantization support.
- quantized_
phi3 - Phi3 model implementation with quantization support.
- quantized_
qwen2 - Qwen2 model implementation with quantization support.
- quantized_
qwen3 - Qwen3 implementation with quantization support.
- quantized_
qwen3_ moe - quantized_
recurrent_ gemma - Recurrent Gemma model implementation with quantization support.
- quantized_
rwkv_ v5 - RWKV v5 model implementation with quantization support.
- quantized_
rwkv_ v6 - RWKV v6 model implementation with quantization support.
- quantized_
stable_ lm - Module for quantized StableLM implementation.
- quantized_
t5 - T5 model implementation with quantization support.
- qwen2
- Qwen2 model implementation with quantization support.
- qwen3
- qwen2_
moe - Qwen2 model implementation with Mixture of Experts support.
- qwen3_
moe - qwen3_
vl - recurrent_
gemma - Recurrent Gemma model implementation
- repvgg
- RepVGG inference implementation
- resnet
- ResNet Implementation
- rwkv_v5
- RWKV v5 model implementation.
- rwkv_v6
- RWKV v6 model implementation.
- rwkv_v7
- RWKV v7 “Goose” (x070) model implementation.
- segformer
- Segformer model implementation for semantic segmentation and image classification.
- segment_
anything - Segment Anything Model (SAM)
- siglip
- Siglip model implementation.
- smol
- SmolLM model family implementations.
- snac
- Implementation of the Multi-Scale Neural Audio Codec (SNAC)
- stable_
diffusion - Stable Diffusion
- stable_
lm - StableLM model implementation.
- starcoder2
- StarCoder model implementation with quantization support.
- stella_
en_ v5 - Stella v5 model implementation.
- t5
- T5 model implementation.
- trocr
- TrOCR model implementation.
- vgg
- VGG-16 model implementation.
- vit
- Vision Transformer (ViT) implementation.
- voxtral
- whisper
- Whisper Model Implementation
- with_
tracing - wuerstchen
- Würstchen Efficient Diffusion Model
- xlm_
roberta - yi
- Yi model implementation.
- z_image
- Z-Image Model