Skip to main content

Crate next_plaid_onnx

Crate next_plaid_onnx 

Source
Expand description

§Next-Plaid ONNX

Fast ColBERT inference using ONNX Runtime with automatic hardware acceleration.

Also includes hierarchical clustering utilities compatible with scipy.

§Quick Start

use next_plaid_onnx::Colbert;

// Simple usage with defaults (auto-detects threads and hardware)
let model = Colbert::new("models/GTE-ModernColBERT-v1")?;

// Encode documents
let doc_embeddings = model.encode_documents(&["Paris is the capital of France."], None)?;

// Encode queries
let query_embeddings = model.encode_queries(&["What is the capital of France?"])?;

§Configuration

Use the builder pattern for advanced configuration:

use next_plaid_onnx::{Colbert, ExecutionProvider};

let model = Colbert::builder("models/GTE-ModernColBERT-v1")
    .with_quantized(true)                              // Use INT8 model for ~2x speedup
    .with_parallel(25)                                 // 25 parallel ONNX sessions
    .with_batch_size(2)                                // Batch size per session
    .with_execution_provider(ExecutionProvider::Cuda)  // Force CUDA
    .build()?;

§Hardware Acceleration

Enable GPU acceleration by adding the appropriate feature:

  • cuda - NVIDIA CUDA (Linux/Windows)
  • tensorrt - NVIDIA TensorRT (optimized CUDA)
  • coreml - Apple Silicon (macOS)
  • directml - Windows GPUs (DirectX 12)

When GPU features are enabled, the library automatically uses GPU if available and falls back to CPU if not.

Modules§

hierarchy
Hierarchical clustering implementation compatible with scipy.cluster.hierarchy.

Structs§

Colbert
ColBERT model for encoding documents and queries into multi-vector embeddings.
ColbertBuilder
Builder for configuring Colbert.
ColbertConfig
Configuration for ColBERT model behavior.

Enums§

ExecutionProvider
Hardware acceleration provider for ONNX Runtime.

Functions§

is_cuda_available
Check if CUDA execution provider is available. Always returns false when CUDA feature is not enabled.
is_force_cpu
Check if CPU-only mode is forced via environment variable. Set NEXT_PLAID_FORCE_CPU=1 to completely disable CUDA and avoid any CUDA library loading overhead.