Module efficientvit

Expand description

EfficientViT (MSRA) inference implementation based on timm.

This crate provides an implementation of the EfficientViT model from Microsoft Research Asia for efficient image classification. The model uses cascaded group attention modules to achieve strong performance while maintaining low memory usage.

The model was originally described in the paper: “EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention”

This implementation is based on the reference implementation from pytorch-image-models.

§Example Usage

This candle implementation uses a pre-trained EfficientViT (from Microsoft Research Asia) network for inference. The classification head has been trained on the ImageNet dataset and returns the probabilities for the top-5 classes.

cargo run
  --example efficientvit \
  --release -- \
  --image candle-examples/examples/yolo-v8/assets/bike.jpg --which m1

> loaded image Tensor[dims 3, 224, 224; f32]
> model built
> mountain bike, all-terrain bike, off-roader: 69.80%
> unicycle, monocycle     : 13.03%
> bicycle-built-for-two, tandem bicycle, tandem: 9.28%
> crash helmet            : 2.25%
> alp                     : 0.46%

Structs§

Config

Functions§

efficientvit
efficientvit_no_final_layer