open_clip_inference 0.1.1

Run OpenCLIP compatible embedding models via ONNX Runtime
Documentation

OpenCLIP embedding in Rust

Easily run pre-trained open_clip compatible embedding models in Rust via ONNX Runtime.

Features

Prerequisites

  1. Rust & Cargo.
  2. uv - to generate ONNX files from HuggingFace models.
  3. onnxruntime - It's linked dynamically as I had issues with static linking.

Usage: Export Model to ONNX

Use the provided pull_onnx.py script to download and export an OpenCLIP model from Hugging Face.

# Run the export script - uv will handle the dependencies
# Example: Export mobileclip 2
uv run pull_onnx.py --id "timm/MobileCLIP2-S2-OpenCLIP"

Usage: Inference in Rust

Option 1: Clip struct

The Clip struct is built for ease of use, handling both vision and text together, with convenience functions for similarity rankings.

use open_clip_inference::Clip;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let model_id = "timm/MobileCLIP2-S2-OpenCLIP";
    let mut clip = Clip::from_model_id(model_id)?;

    let img = image::open(Path::new("assets/img/cat_face.jpg"))?;
    let labels = &["cat", "dog", "beignet"];

    let results = clip.classify(&img, labels)?;

    for (label, prob) in results {
        println!("{}: {:.2}%", label, prob * 100.0);
    }

    Ok(())
}

Option 2: Individual vision & text embedders

Use VisionEmbedder or TextEmbedder standalone to just produce embeddings from images & text.

use open_clip_inference::{VisionEmbedder, TextEmbedder, Clip};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let model_id = "timm/MobileCLIP2-S2-OpenCLIP";
    let mut vision = VisionEmbedder::from_model_id(model_id)?;
    let mut text = TextEmbedder::from_model_id(model_id)?;

    let img = image::open(Path::new("assets/img/cat_face.jpg"))?;
    let img_emb = vision.embed_image(&img)?;
    // Now you may put the embeddings in a database like Postgres with PgVector to set up semantic image search.
    
    let text_embs = text.embed_text("a cat")?;
    // You can search with the text embedding through images using cosine similarity.
    // All embeddings produced are already l2 normalized.

    Ok(())
}

Examples

Run the included examples (ensure you have exported the relevant model first):

# Simple generic example
cargo run --example basic

# Semantic image search demo
cargo run --example search

Tested Models

The following models have been tested to work with pull_onnx.py & this Rust crate. I picked these models as they are highest performing in benchmarks or most popular on HuggingFace.

  • timm/MobileCLIP2-S4-OpenCLIP
  • laion/CLIP-ViT-B-32-laion2B-s34B-b79K
  • timm/ViT-SO400M-16-SigLIP2-384
  • timm/vit_base_patch32_clip_224.openai
  • microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
  • imageomics/bioclip
  • Marqo/marqo-fashionSigLIP
  • timm/PE-Core-bigG-14-448
  • timm/ViT-SO400M-14-SigLIP-384

Verified Embeddings

The following models have been verified to produce embeddings in Rust that match the Python reference implementation:

Python implementations here: https://github.com/RuurdBijlsma/clip-model-research

  • timm/ViT-SO400M-16-SigLIP2-384
  • timm/MobileCLIP2-S4-OpenCLIP
  • timm/vit_base_patch32_clip_224.openai
  • timm/ViT-SO400M-14-SigLIP-384
  • Marqo/marqo-fashionSigLIP

Troubleshooting

ONNX Runtime Library Not Found

Onnxruntime is dynamically loaded, so if it's not found correctly, then download the correct onnxruntime library from GitHub Releases.

Then put the dll/so/dylib location in your PATH, or point the ORT_DYLIB_PATH env var to it.

PowerShell example:

  • Adjust path to where the dll is.
$env:ORT_DYLIB_PATH="C:/Apps/onnxruntime/lib/onnxruntime.dll"

Shell example:

export ORT_DYLIB_PATH="/usr/local/lib/libonnxruntime.so"