# Rust OCR and ML Ecosystem Analysis for ruvector-scipix
## Executive Summary
This document provides a comprehensive analysis of the Rust ecosystem for OCR (Optical Character Recognition) and machine learning, focusing on libraries suitable for the ruvector-scipix project. The analysis covers seven primary OCR/ML libraries, examines ONNX Runtime integration options, evaluates GPU acceleration capabilities, and provides technology stack recommendations optimized for performance, memory efficiency, and cross-platform deployment.
**Key Finding**: The optimal stack for ruvector-scipix combines `ort` (ONNX Runtime bindings) for inference, `image`/`imageproc` for preprocessing, with optional pure Rust alternatives (`tract`, `candle`) for WASM targets.
---
## 1. Library Comparison Matrix
### OCR Libraries
| **ocrs** | Native Rust | ONNX (RTen engine) | ✅ Yes | ❌ No | 🟡 Preview | Medium | Minimal (Pure Rust) |
| **oar-ocr** | ONNX Wrapper | PaddleOCR ONNX | ✅ Yes | ✅ CUDA | 🟢 Stable | High | ort (ONNX Runtime) |
| **kalosm-ocr** | Pure Rust | TrOCR (candle) | ✅ Yes | ✅ WGPU/Metal/CUDA | 🟡 Alpha | Medium | candle ML framework |
| **leptess** | FFI Bindings | Tesseract C++ | ❌ No | ❌ No | 🟢 Mature | High (CPU) | Tesseract C++ library |
| **paddle-ocr-rs** | ONNX Wrapper | PaddleOCR v4/v5 | ✅ Yes | ✅ CUDA/TensorRT | 🟢 Stable | Very High | ort (ONNX Runtime) |
| **pure-onnx-ocr** | Pure ONNX | PaddleOCR DBNet+SVTR | ✅ Yes | ✅ Via ONNX RT | 🟢 Active (2025) | High | No C/C++ deps |
### ML Inference Engines
| **ort** | ONNX Runtime | ONNX | ✅ Yes | ✅ CUDA/TensorRT/OpenVINO | **Very High** | 🟢 Production |
| **candle** | ML Framework | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | High | 🟢 Stable (HuggingFace) |
| **tract** | ONNX/TF Inference | ONNX, NNEF, TF | ✅ Yes | ❌ Limited | High (CPU) | 🟢 Mature (Sonos) |
| **burn** | Deep Learning | Multiple | ✅ Yes | ✅ CUDA/Metal/WGPU | Very High | 🟢 Active |
### Performance Benchmarks
Based on research findings:
- **ort + PaddleOCR**: 73.1% latency reduction for recognition, 40.4% for detection (NVIDIA T4)
- **ONNX conversion**: Up to 5x faster than PaddlePaddle native inference
- **tract**: 70μs (RPi Zero), 11μs (RPi 3) for CNN models
- **Tesseract (leptess)**: Baseline CPU performance, requires preprocessing
- **ocrs**: Early preview, moderate performance on clear text
---
## 2. ONNX Runtime Integration Options
### 2.1 The `ort` Crate (Recommended)
**Overview**: `ort` by pykeio is the premier ONNX Runtime binding for Rust, offering production-grade performance and extensive hardware acceleration support.
**Key Features**:
- **Hardware Acceleration**: CUDA, TensorRT, OpenVINO, Qualcomm QNN, Huawei CANN
- **Dynamic Loading**: Runtime linking for flexibility (`load-dynamic` feature)
- **Alternative Backends**: Support for tract and candle backends
- **Minimal Builds**: RTTI-free, optimized binary sizes for production
- **Float16/BFloat16**: Via `half` crate integration
- **Production Proven**: Used by Twitter (homepage recommendations), Google (Magika), Bloop, SurrealDB
**Cargo Features**:
```toml
[dependencies]
ort = { version = "2.0.0-rc", features = [
"half", # Float16/BFloat16 support
"load-dynamic", # Runtime dynamic linking
"cuda", # NVIDIA GPU acceleration (requires CUDA 11.6+)
"tensorrt", # TensorRT optimization (requires TensorRT 8.4+)
] }
```
**Performance Characteristics**:
- Significantly faster than PyTorch for inference
- Supports model quantization (int8, float16)
- Multi-GPU distribution via NCCL
- Optimal for batch processing and real-time inference
**Integration Example**:
```rust
use ort::{Session, Value};
// Load ONNX model
let session = Session::builder()?
.with_optimization_level(GraphOptimizationLevel::Level3)?
.with_intra_threads(4)?
.commit_from_file("model.onnx")?;
// Run inference
let input = Value::from_array(session.allocator(), &input_tensor)?;
let outputs = session.run(vec![input])?;
```
### 2.2 Alternative: `tract` Backend
**Use Case**: When ONNX Runtime binaries are problematic or WASM target required
**Advantages**:
- Pure Rust implementation
- No external C++ dependencies
- Excellent WASM support
- Passes 85% of ONNX backend tests
- Lightweight and maintainable
**Limitations**:
- No tensor sequences or optional tensors
- Limited GPU support compared to ort
- TensorFlow 2 support via ONNX conversion only
### 2.3 Alternative: `candle` Backend
**Use Case**: When integrating with Hugging Face ecosystem or needing pure Rust
**Advantages**:
- Minimalist design, fast compilation
- Native Hugging Face model support (LLaMA, Whisper, Stable Diffusion)
- WASM + WebGPU acceleration
- Small binary size for serverless deployment
- CUDA, Metal, MKL, Accelerate backends
**Limitations**:
- Younger ecosystem than ONNX Runtime
- Fewer pre-optimized OCR models available
- Focus on inference over training
---
## 3. Pure Rust ML with Candle/Tract
### 3.1 Candle Framework (Hugging Face)
**Architecture**: Minimalist ML framework emphasizing inference efficiency and cross-platform deployment.
**Supported Models**:
- **Language Models**: LLaMA (v1/v2/v3), Mistral 7b, Mixtral 8x7b, Phi 1/2/3, Gemma, StarCoder
- **Vision Models**: Stable Diffusion (1.5, 2.1, SDXL), YOLO (v3/v8), Segment Anything
- **Speech**: Whisper ASR
**Backend Support**:
| CUDA | NVIDIA GPU | Very High | Production inference |
| Metal | Apple Silicon | High | macOS/iOS deployment |
| CPU (MKL) | x86 Intel | Medium-High | CPU-only servers |
| CPU (Accelerate) | Apple | Medium-High | macOS CPU fallback |
| WGPU | WebGPU-enabled | Medium | Browser deployment |
**Design Philosophy**:
- Remove Python from production workloads
- Minimize binary size (critical for edge/serverless)
- Fast startup times (first token ~120ms on M2 MacBook Air)
- Rust's safety guarantees for ML workloads
**Example Usage**:
```rust
use candle_core::{Device, Tensor};
use candle_onnx;
// Load model
let model = candle_onnx::read_file("model.onnx")?;
let graph = model.graph.as_ref().unwrap();
// Create device (CUDA/Metal/CPU)
let device = Device::cuda_if_available(0)?;
// Run inference
let input = Tensor::randn(0f32, 1f32, (1, 3, 224, 224), &device)?;
let output = model.forward(&[input])?;
```
### 3.2 Tract Framework (Sonos)
**Architecture**: Pure Rust ONNX/TensorFlow inference engine optimized for embedded devices.
**Key Capabilities**:
- **ONNX Support**: 85% of ONNX backend tests passing
- **Operator Set**: ONNX 1.4.1 (opset 9) through 1.13.0 (opset 18)
- **Proven Models**: AlexNet, DenseNet, Inception, ResNet, VGG, SqueezeNet, etc.
- **Pulsing**: Streaming inference for time-series models (e.g., WaveNet)
- **Quantization**: Built-in int8 quantization support
**Performance Characteristics**:
- Optimized for CPU inference
- Excellent for edge devices (Raspberry Pi, embedded systems)
- Minimal memory footprint
- No RTTI or runtime overhead
**Example Usage**:
```rust
use tract_onnx::prelude::*;
// Load and optimize model
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
.into_optimized()?
.into_runnable()?;
// Run inference
let input = tract_ndarray::arr4(&[[...]]).into_dyn();
let result = model.run(tvec![input.into()])?;
```
**Quantization Support**:
```rust
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, f32::fact([1, 3, 224, 224]).into())?
.quantize()? // Automatic int8 quantization
.into_optimized()?
.into_runnable()?;
```
### 3.3 Comparison: Candle vs Tract vs ort
| **Performance (GPU)** | Very High | N/A | Very High |
| **Performance (CPU)** | High | Very High | Very High |
| **Binary Size** | Small | Very Small | Large |
| **Startup Time** | Fast | Very Fast | Medium |
| **WASM Support** | Excellent | Excellent | Good (with backends) |
| **Model Ecosystem** | Hugging Face | ONNX/TF | ONNX (largest) |
| **GPU Backends** | CUDA/Metal/WGPU | Limited | CUDA/TensorRT/OpenVINO |
| **Quantization** | Manual | Built-in | Excellent (ONNX tools) |
| **Maturity** | Stable (2024+) | Mature (2018+) | Production (Microsoft) |
**Recommendation**:
- **ort**: Primary choice for maximum performance and hardware acceleration
- **candle**: Secondary choice for WASM targets or Hugging Face integration
- **tract**: Fallback for pure Rust requirements or extreme size constraints
---
## 4. Image Processing in Rust
### 4.1 The `image` Crate (Foundation)
**Purpose**: Core image encoding/decoding and basic manipulation.
**Supported Formats**:
- JPEG, PNG, GIF, WebP, TIFF, BMP, ICO, PNM, DDS, TGA, OpenEXR, AVIF
**Key Features**:
```rust
use image::{DynamicImage, ImageBuffer, Rgba, GenericImageView};
// Load image
let img = image::open("input.jpg")?;
// Basic operations (in imageops module)
let resized = img.resize(800, 600, image::imageops::FilterType::Lanczos3);
let grayscale = img.grayscale();
let blurred = imageops::blur(&img, 2.0);
let contrast_adjusted = imageops::contrast(&img, 30.0);
```
### 4.2 The `imageproc` Crate (Advanced Processing)
**Purpose**: Advanced image processing algorithms for computer vision.
**Modules**:
| **Contrast** | Histogram equalization, adaptive thresholding, CLAHE |
| **Corners** | Harris, FAST, Shi-Tomasi corner detection |
| **Distance Transform** | Euclidean distance maps, morphological operations |
| **Edges** | Canny edge detection, Sobel/Scharr operators |
| **Filter** | Gaussian, median, bilateral filtering |
| **Geometric** | Rotation, affine, projective transformations |
| **Morphology** | Erosion, dilation, opening, closing |
| **Drawing** | Shapes, text, anti-aliased primitives |
| **Contours** | Border tracing, contour extraction |
**Parallelism**: CPU-based multithreading via `rayon` (not GPU acceleration)
**OCR Preprocessing Example**:
```rust
use imageproc::contrast::{adaptive_threshold, ThresholdType};
use imageproc::filter::gaussian_blur_f32;
use imageproc::geometric_transformations::{rotate_about_center, Interpolation};
// Preprocessing pipeline for OCR
fn preprocess_for_ocr(img: &DynamicImage) -> GrayImage {
// Convert to grayscale
let gray = img.to_luma8();
// Denoise with Gaussian blur
let blurred = gaussian_blur_f32(&gray, 1.0);
// Adaptive thresholding for varying lighting
let binary = adaptive_threshold(&blurred, 21);
// Deskew if needed
let angle = detect_skew(&binary); // Custom function
let deskewed = rotate_about_center(&binary, angle, Interpolation::Bilinear, Luma([255u8]));
deskewed
}
```
### 4.3 GPU Acceleration Options for Image Processing
**Current State**: `imageproc` does NOT provide GPU acceleration. For GPU-accelerated image processing, consider:
**Option 1: `wgpu` + Custom Compute Shaders**
```rust
use wgpu;
// GPU compute shader for image processing
let shader = device.create_shader_module(wgpu::ShaderModuleDescriptor {
label: Some("Image Processing"),
source: wgpu::ShaderSource::Wgsl(include_str!("process.wgsl")),
});
```
**Option 2: OpenCV-Rust Bindings** (if CUDA needed)
- Provides GPU-accelerated operations via CUDA
- Requires OpenCV C++ installation
- Not pure Rust
**Option 3: Integrate with ML Framework GPU Ops**
- Use candle/ort tensor operations for preprocessing
- Leverage existing GPU context
- Keep preprocessing on same device as inference
**Recommendation for ruvector-scipix**:
- Use `image` + `imageproc` for CPU preprocessing (fast enough for most cases)
- For GPU pipeline, implement preprocessing as ONNX graph nodes or candle operations
- Leverage rayon parallelism for batch processing
---
## 5. GPU Acceleration Options
### 5.1 Cross-Platform GPU Support in 2025
The Rust ML ecosystem has achieved robust cross-platform GPU support through standardization around WebGPU and established APIs.
**Unified Backend: `wgpu` (WebGPU Standard)**
- **Targets**: Vulkan (Linux/Windows/Android), Metal (macOS/iOS), DirectX 12 (Windows), WebGPU (browsers)
- **Use Case**: Portable GPU compute without vendor lock-in
- **Frameworks**: Burn, Candle (WGPU backend), kalosm
**Performance Profile**:
| CUDA | NVIDIA GPU | 10-50x | Production ML inference |
| TensorRT | NVIDIA GPU | 15-70x | Optimized ONNX models |
| Metal | Apple Silicon | 8-30x | macOS/iOS deployment |
| OpenVINO | Intel | 5-20x | Intel CPU/GPU optimization |
| WGPU | WebGPU-capable | 3-15x | Browser/cross-platform |
| ROCm | AMD GPU | 10-40x | AMD GPU acceleration |
### 5.2 CUDA Support
**Primary Library**: `cudarc` (Low-level CUDA bindings)
**Integration via ONNX Runtime**:
```toml
[dependencies]
ort = { version = "2.0", features = ["cuda"] }
```
**Requirements**:
- CUDA Toolkit 11.6+ (for ort)
- NVIDIA GPU: Maxwell (7xx series) or newer
- Compute Capability 5.0+
**Benefits**:
- Industry-standard ML acceleration
- Mature ecosystem and tooling
- Extensive operator coverage
- Best-in-class performance for training and inference
### 5.3 Metal Support (Apple Silicon)
**Framework Integration**:
- **Candle**: Native Metal backend via `metal` crate
- **Burn**: Metal support through `burn-metal` backend
- **ONNX Runtime**: CoreML execution provider (Metal-accelerated)
**Example (Candle)**:
```rust
use candle_core::Device;
let device = Device::new_metal(0)?; // First Metal device
let tensor = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;
```
**Performance**: 8-30x speedup vs CPU, optimized for M1/M2/M3 chips
### 5.4 WebGPU/WGPU
**Purpose**: Cross-platform GPU compute for WASM and native
**Frameworks with WGPU Support**:
- **Burn**: First-class WGPU backend
- **Candle**: WGPU support for browser deployment
- **Kalosm**: WGPU acceleration via Fusor (0.5 release)
**Browser Deployment**:
```rust
// WASM-compatible GPU inference
#[cfg(target_arch = "wasm32")]
use candle_core::Device;
let device = Device::Cpu; // Or Device::Metal/Cuda if available
```
**Benefits**:
- Browser-based ML inference without server
- Works on AMD GPUs (unlike CUDA)
- Portable across desktop and web
- Future-proof standard (W3C specification)
**Limitations**:
- Lower performance than native CUDA/Metal
- Browser memory constraints (typically 2-8GB)
- First token latency: ~120ms (acceptable for many use cases)
### 5.5 TensorRT (NVIDIA Optimization)
**Purpose**: Optimized ONNX model execution on NVIDIA GPUs
**Requirements**:
- NVIDIA GPU: GeForce 9xx series or newer
- TensorRT 8.4+
- CUDA 11.6+
**Integration**:
```toml
ort = { version = "2.0", features = ["cuda", "tensorrt"] }
```
**Benefits**:
- Automatic kernel fusion and layer optimization
- Mixed precision (FP32/FP16/INT8)
- Up to 2-5x faster than standard CUDA
- Optimal for high-throughput production deployment
### 5.6 OpenVINO (Intel)
**Target**: Intel CPUs (6th gen+) and Intel integrated GPUs
**Use Case**:
- Intel-based servers without discrete GPU
- Edge devices with Intel processors
- Cost-effective acceleration without NVIDIA hardware
**Integration**:
```toml
ort = { version = "2.0", features = ["openvino"] }
```
**Performance**: 5-20x CPU speedup depending on model and hardware
### 5.7 GPU Acceleration Recommendation for ruvector-scipix
**Tiered Approach**:
1. **Primary (Production)**: `ort` with CUDA/TensorRT
- Maximum performance for server deployment
- Best operator coverage for PaddleOCR models
- Production-proven reliability
2. **Secondary (Apple Ecosystem)**: `candle` with Metal
- Native Apple Silicon support
- Good for macOS/iOS deployment
- Smaller binary size than ONNX Runtime
3. **Tertiary (WASM/Browser)**: `candle` or `tract` with WGPU
- Client-side OCR in browser
- Privacy-preserving (no server upload)
- Acceptable performance for interactive use
4. **Fallback (CPU-only)**: `tract` or `ort` with optimized CPU execution
- MKL/OpenBLAS acceleration
- Rayon parallelism
- Still faster than Python alternatives
---
## 6. WebAssembly Compilation Considerations
### 6.1 WASM for ML: Current State (2025)
**Key Finding**: Rust + WASM is the optimal combination for browser-based ML inference, outperforming C++ and other alternatives.
**Performance Characteristics**:
- Rust compiles to WASM **faster** than C++
- Rust produces **smaller binaries** than C++ WASM
- **Memory efficiency**: Rust's ownership model translates well to WASM linear memory
- Consistent performance across browsers
### 6.2 Memory Constraints and Optimization
**Browser Memory Limits**:
- Typical: 2-4GB per tab (Chrome/Firefox)
- Maximum: 4-8GB (varies by browser/OS)
- **Critical Issue**: Running multiple models can exhaust memory quickly
**Memory Optimization Strategies**:
**1. Model Quantization**
```rust
// INT8 quantization reduces memory by 4x
// FP16 quantization reduces memory by 2x
let quantized_model = model.quantize(QuantizationType::QInt8)?;
```
**2. Memory Reuse**
```rust
// Pre-allocate tensors, reuse across inferences
struct InferenceContext {
input_buffer: Vec<f32>,
output_buffer: Vec<f32>,
}
impl InferenceContext {
fn run_inference(&mut self, model: &Model, data: &[f32]) -> Result<&[f32]> {
self.input_buffer.copy_from_slice(data);
model.run(&self.input_buffer, &mut self.output_buffer)?;
Ok(&self.output_buffer)
}
}
```
**3. Lazy Loading with Streaming Compile**
```rust
// Use WebAssembly.instantiateStreaming for faster startup
// Load models on-demand, not at initialization
async fn load_model_lazy(url: &str) -> Result<Module> {
let response = window.fetch(url).await?;
let module = WebAssembly::instantiate_streaming(response).await?;
Ok(module)
}
```
**4. wasm-opt Optimization**
```bash
# Optimize WASM binary size and performance
wasm-opt -Oz --enable-simd --enable-bulk-memory input.wasm -o output.wasm
```
**5. Model Cleanup**
```rust
// Explicit cleanup when switching models
impl Drop for ModelContext {
fn drop(&mut self) {
// Free GPU resources
self.gpu_buffers.clear();
// Trigger garbage collection hint (if available)
}
}
```
### 6.3 Bundle Size Considerations
**Challenge**: Rust-derived WASM bundles often exceed 300KB (uncompressed), delaying first paint.
**Mitigation Strategies**:
**1. Code Splitting**
```rust
// Load OCR functionality separately from main bundle
#[wasm_bindgen]
pub async fn init_ocr() -> Result<OcrEngine, JsValue> {
// Lazy-load OCR model
let model = load_model("ocr.onnx").await?;
Ok(OcrEngine::new(model))
}
```
**2. Minimal Features**
```toml
[dependencies]
ort = { version = "2.0", default-features = false, features = ["minimal-build"] }
tract-onnx = { version = "0.22", default-features = false }
```
**3. Compression**
```bash
# Brotli compression (recommended by Chrome)
brotli -q 11 output.wasm -o output.wasm.br
# Gzip fallback
gzip -9 output.wasm
```
**4. Tree Shaking**
```toml
[profile.release]
opt-level = "z" # Optimize for size
lto = true
codegen-units = 1
panic = "abort"
strip = true
```
**Expected Sizes**:
| Minimal tract | ~800KB | ~250KB | ~320KB |
| Full ort | ~3MB | ~900KB | ~1.1MB |
| Candle (minimal) | ~600KB | ~180KB | ~240KB |
### 6.4 WASM-Specific Limitations
**1. Threading Constraints**
- SharedArrayBuffer required for multi-threading
- COEP/COOP headers needed for isolation
- Not all browsers support WASM threads
**2. SIMD Support**
- WASM SIMD enabled by default in modern browsers
- Significant performance boost for ML operations
- Check browser compatibility: `wasm-feature-detect`
**3. No Direct File System Access**
- Use IndexedDB or Cache API for model storage
- Stream models from network (HTTP/2)
- Consider embedding small models in binary
**4. GPU Access**
- WebGPU required for GPU acceleration
- Not universally supported (as of 2025, Chrome/Edge primarily)
- Fallback to CPU inference needed
### 6.5 Recommended WASM Frameworks for ruvector-scipix
**Primary: `candle` with WGPU**
- Smallest binary size
- Native WASM support
- WebGPU acceleration when available
- Hugging Face ecosystem
**Secondary: `tract`**
- Pure Rust, no C++ dependencies
- Excellent WASM support
- Proven in production (Sonos)
- CPU-optimized
**Alternative: `ort` with WASM backend**
- Full ONNX operator support
- Can use tract or candle as backend
- Larger bundle size
**Example WASM Integration**:
```rust
use wasm_bindgen::prelude::*;
use candle_core::{Device, Tensor};
#[wasm_bindgen]
pub struct OcrEngine {
model: candle_onnx::Model,
device: Device,
}
#[wasm_bindgen]
impl OcrEngine {
#[wasm_bindgen(constructor)]
pub async fn new() -> Result<OcrEngine, JsValue> {
// Use WebGPU if available, fallback to CPU
let device = Device::Cpu; // Or Device::new_wgpu(0)?
// Load model from URL
let model_bytes = fetch_model("model.onnx").await?;
let model = candle_onnx::read(&model_bytes)
.map_err(|e| JsValue::from_str(&e.to_string()))?;
Ok(OcrEngine { model, device })
}
pub fn recognize_text(&self, image_data: &[u8]) -> Result<String, JsValue> {
// Preprocess image
let tensor = preprocess_image(image_data, &self.device)?;
// Run inference
let output = self.model.forward(&[tensor])
.map_err(|e| JsValue::from_str(&e.to_string()))?;
// Decode output
let text = decode_predictions(output)?;
Ok(text)
}
}
```
### 6.6 WASM Deployment Checklist
- [ ] Enable WASM SIMD in build (`RUSTFLAGS='-C target-feature=+simd128'`)
- [ ] Optimize bundle size (`opt-level = "z"`, LTO, strip)
- [ ] Implement lazy loading for models
- [ ] Set up proper CORS headers for model fetching
- [ ] Add WebGPU feature detection with CPU fallback
- [ ] Configure Brotli/Gzip compression on CDN
- [ ] Test memory usage across browsers (especially mobile)
- [ ] Implement model cleanup on tab close
- [ ] Add loading indicators for async model initialization
- [ ] Consider service worker for model caching
---
## 7. Memory Management for Large Models
### 7.1 Memory Challenges in ML Inference
**Typical OCR Model Sizes**:
- PaddleOCR Detection: 3-10MB (FP32)
- PaddleOCR Recognition: 5-15MB (FP32)
- TrOCR: 50-300MB (depending on variant)
- Tesseract trained data: 10-50MB per language
**Memory Consumption Beyond Model Weights**:
- Input tensors: Image size × channels × precision
- Intermediate activations: Varies by architecture (can exceed model size)
- Output buffers: Sequence length × vocab size
- KV cache (for transformers): Context length × hidden size × layers
### 7.2 Quantization Strategies
**INT8 Quantization** (4x memory reduction)
```rust
// ONNX Runtime quantization
use ort::quantization::{QuantizationConfig, QuantizationType};
let config = QuantizationConfig::default()
.with_per_channel(true)
.with_reduce_range(true);
let quantized_model = ort::quantize("model.onnx", "model_int8.onnx", config)?;
```
**Benefits**:
- 75% memory reduction (FP32 → INT8)
- Minimal accuracy loss (typically <1% for OCR)
- Faster inference on integer-optimized hardware
- Reduced cache pressure
**FP16 Quantization** (2x memory reduction)
```rust
// Using ort with half crate
use half::f16;
use ort::tensor::OrtOwnedTensor;
```
**Benefits**:
- Better accuracy preservation than INT8
- Native support on modern GPUs (Tensor Cores)
- Still significant memory savings
**Dynamic Quantization** (Runtime)
```rust
// tract supports dynamic quantization
let model = tract_onnx::onnx()
.model_for_path("model.onnx")?
.with_input_fact(0, InferenceFact::dt_shape(f32::datum_type(), dims))?
.quantize()? // Automatic quantization
.into_optimized()?
.into_runnable()?;
```
### 7.3 Memory Pooling and Reuse
**Tensor Buffer Reuse**:
```rust
use std::sync::Arc;
use parking_lot::Mutex;
struct TensorPool {
buffers: Vec<Arc<Mutex<Vec<f32>>>>,
size: usize,
}
impl TensorPool {
fn new(pool_size: usize, buffer_size: usize) -> Self {
let buffers = (0..pool_size)
.map(|_| Arc::new(Mutex::new(vec![0.0f32; buffer_size])))
.collect();
TensorPool { buffers, size: pool_size }
}
fn acquire(&self) -> Option<Arc<Mutex<Vec<f32>>>> {
// Round-robin or availability-based selection
self.buffers.first().cloned()
}
}
```
**Session Pooling** (ONNX Runtime):
```rust
use once_cell::sync::Lazy;
use ort::Session;
static SESSION_POOL: Lazy<Vec<Session>> = Lazy::new(|| {
(0..4).map(|_| {
Session::builder()
.unwrap()
.commit_from_file("model.onnx")
.unwrap()
}).collect()
});
fn get_session() -> &'static Session {
&SESSION_POOL[thread_id % 4]
}
```
### 7.4 Streaming and Batching
**Batch Processing** (Amortize overhead):
```rust
fn process_batch(images: &[DynamicImage], model: &Session) -> Result<Vec<String>> {
let batch_size = images.len();
// Create batched tensor [batch_size, channels, height, width]
let mut batch_tensor = vec![0.0f32; batch_size * 3 * 224 * 224];
for (i, img) in images.iter().enumerate() {
let offset = i * 3 * 224 * 224;
preprocess_into_buffer(img, &mut batch_tensor[offset..]);
}
// Single inference call for entire batch
let output = model.run(vec![batch_tensor.into()])?;
// Decode batch results
decode_batch_predictions(output, batch_size)
}
```
**Streaming Inference** (For large documents):
```rust
async fn process_document_streaming(
pages: impl Stream<Item = Image>,
model: &Session,
) -> impl Stream<Item = Result<String>> {
pages.map(|page| {
// Process one page at a time
let text = recognize_text(&page, model)?;
Ok(text)
})
}
```
### 7.5 Model Sharding and Lazy Loading
**Lazy Model Loading**:
```rust
use once_cell::sync::OnceCell;
static DETECTION_MODEL: OnceCell<Session> = OnceCell::new();
static RECOGNITION_MODEL: OnceCell<Session> = OnceCell::new();
fn get_detection_model() -> &'static Session {
DETECTION_MODEL.get_or_init(|| {
Session::builder()
.unwrap()
.commit_from_file("detection.onnx")
.unwrap()
})
}
```
**Conditional Loading**:
```rust
// Only load language-specific models when needed
struct OcrEngine {
detection: Session,
recognition_models: HashMap<Language, OnceCell<Session>>,
}
impl OcrEngine {
fn recognize(&self, img: &Image, lang: Language) -> Result<String> {
let boxes = self.detect(img)?;
let rec_model = self.recognition_models
.get(&lang)
.unwrap()
.get_or_init(|| load_recognition_model(lang));
self.recognize_boxes(img, &boxes, rec_model)
}
}
```
### 7.6 Memory Mapping (Large Models)
**Using `memmap2` for Model Files**:
```rust
use memmap2::Mmap;
use std::fs::File;
fn load_model_mmap(path: &str) -> Result<Mmap> {
let file = File::open(path)?;
let mmap = unsafe { Mmap::map(&file)? };
Ok(mmap)
}
// Model data stays on disk, paged in as needed
// Useful for models >100MB
```
**Benefits**:
- Reduced resident memory
- Faster startup (no full load)
- Shared memory across processes
**Limitations**:
- Not available in WASM
- Requires file system access
- May have higher latency on first access
### 7.7 GPU Memory Management
**CUDA Unified Memory**:
```rust
// ort automatically manages GPU memory
let session = Session::builder()?
.with_execution_providers([ExecutionProvider::CUDA])?
.commit_from_file("model.onnx")?;
// Tensors automatically transferred to/from GPU
```
**Manual GPU Memory Control** (candle):
```rust
use candle_core::{Device, Tensor};
let device = Device::new_cuda(0)?;
// Allocate on GPU
let tensor_gpu = Tensor::randn(0f32, 1f32, (1024, 1024), &device)?;
// Transfer to CPU when needed
let tensor_cpu = tensor_gpu.to_device(&Device::Cpu)?;
// Explicit cleanup
drop(tensor_gpu);
```
### 7.8 Memory Profiling and Monitoring
**Rust Memory Profiling Tools**:
- `valgrind --tool=massif`: Heap profiling
- `heaptrack`: Heap memory profiler (Linux)
- `dhat`: Dynamic heap analysis tool
- `tokio-console`: Async runtime monitoring
**Custom Memory Tracking**:
```rust
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering};
struct TrackingAllocator;
static ALLOCATED: AtomicUsize = AtomicUsize::new(0);
unsafe impl GlobalAlloc for TrackingAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
ALLOCATED.fetch_add(layout.size(), Ordering::SeqCst);
System.alloc(layout)
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
ALLOCATED.fetch_sub(layout.size(), Ordering::SeqCst);
System.dealloc(ptr, layout)
}
}
#[global_allocator]
static GLOBAL: TrackingAllocator = TrackingAllocator;
fn get_memory_usage() -> usize {
ALLOCATED.load(Ordering::SeqCst)
}
```
### 7.9 Memory Optimization Recommendations for ruvector-scipix
**Priority Strategies**:
1. **Quantize Models** (INT8 for production)
- 4x memory reduction
- Minimal accuracy impact for OCR
- Use ONNX Runtime quantization tools
2. **Implement Tensor Pooling**
- Reuse buffers for repeated inferences
- Align with ruvector-core's memory management patterns
- Use `parking_lot` for efficient synchronization
3. **Lazy Load Language Models**
- Only load recognition models for requested languages
- Use `OnceCell` for thread-safe initialization
- Share models across threads
4. **Batch Processing**
- Group multiple images into single inference call
- Amortize overhead, improve GPU utilization
- Integrate with ruvector's parallel processing
5. **GPU Memory Awareness**
- Monitor GPU memory usage
- Implement fallback to CPU if GPU OOM
- Use smaller batch sizes on memory-constrained devices
6. **Profile Real Workloads**
- Measure memory with actual ruvector data
- Identify bottlenecks (model weights vs activations)
- Optimize based on data
---
## 8. Recommended Technology Stack for ruvector-scipix
### 8.1 Primary Stack (Production Deployment)
**Inference Engine**: `ort` (ONNX Runtime)
- **Version**: `2.0.0-rc` or latest stable
- **Features**: `cuda`, `tensorrt`, `half`, `load-dynamic`
- **Rationale**:
- Best-in-class performance (73% latency reduction)
- Extensive GPU support (CUDA, TensorRT, OpenVINO)
- Production-proven (Twitter, Google, SurrealDB)
- Largest ONNX model ecosystem
**OCR Models**: PaddleOCR v5 (ONNX format)
- **Detection**: `ch_PP-OCRv5_mobile_det.onnx`
- **Recognition**: `ch_PP-OCRv5_mobile_rec.onnx`
- **Rationale**:
- State-of-the-art accuracy
- Optimized for speed (5x faster in ONNX)
- Multi-language support (80+ languages)
- Active development (2025 updates)
**Image Processing**: `image` + `imageproc`
- **Version**: Latest stable
- **Rationale**:
- Comprehensive format support
- CPU parallelism via rayon (already in workspace)
- Mature, well-tested
- Pure Rust (no C++ dependencies)
**Dependencies Integration**:
```toml
[dependencies]
# Inference
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }
# Image processing
image = "0.25"
imageproc = "0.25"
# Existing ruvector-core dependencies (reuse)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
thiserror = { workspace = true }
serde = { workspace = true }
```
### 8.2 Alternative Stack (WASM/Browser Deployment)
**Inference Engine**: `candle` with WGPU backend
- **Version**: Latest stable from Hugging Face
- **Features**: `wasm`, `webgpu`
- **Rationale**:
- Smallest WASM bundle size
- Native WebGPU support
- Fast startup times
- Pure Rust
**OCR Models**: TrOCR (via candle-onnx) or lightweight PaddleOCR
- Smaller models for browser constraints
- Quantized INT8 versions
**WASM-Specific Stack**:
```toml
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
candle-onnx = { version = "0.8" }
wasm-bindgen = { workspace = true }
web-sys = { workspace = true }
```
### 8.3 Fallback Stack (Pure Rust/No External Dependencies)
**Inference Engine**: `tract`
- **Use Case**: When ONNX Runtime binaries unavailable or pure Rust required
- **Rationale**:
- No C++ dependencies
- Excellent WASM support
- Mature (Sonos production use)
- Passes 85% ONNX tests
**Stack**:
```toml
[dependencies]
tract-onnx = "0.22"
image = "0.25"
imageproc = "0.25"
```
### 8.4 Architecture Design
```
┌─────────────────────────────────────────────────────────────┐
│ ruvector-scipix │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Image Input │────▶│ Preprocessing│───▶│ Detection │ │
│ │ (image) │ │ (imageproc) │ │ (ort/ONNX) │ │
│ └──────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Text Boxes │ │
│ └──────┬───────┘ │
│ │ │
│ ┌─────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Recognition │─────▶│ Post-Proc. │ │
│ │ (ort/ONNX) │ │ (decode) │ │
│ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Vector Store │ │
│ │ (ruvector- │ │
│ │ core) │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
GPU Acceleration Layers:
├─ CUDA/TensorRT (NVIDIA)
├─ Metal (Apple Silicon)
├─ OpenVINO (Intel)
└─ WGPU (Cross-platform/Browser)
```
### 8.5 Module Structure
```
examples/scipix/
├── Cargo.toml
├── src/
│ ├── lib.rs # Public API
│ ├── engine.rs # OCR engine orchestration
│ ├── detection.rs # Text detection (ONNX)
│ ├── recognition.rs # Text recognition (ONNX)
│ ├── preprocessing.rs # Image preprocessing (imageproc)
│ ├── postprocessing.rs # Result decoding and formatting
│ ├── models.rs # Model loading and management
│ └── config.rs # Configuration
├── models/ # ONNX model files (gitignored)
│ ├── detection.onnx
│ ├── recognition.onnx
│ └── dict.txt
├── tests/
│ ├── integration_test.rs
│ └── benchmark.rs
└── docs/
├── 01_REQUIREMENTS.md
├── 02_ARCHITECTURE.md
└── 03_RUST_ECOSYSTEM.md # This document
```
### 8.6 Performance Targets
Based on PaddleOCR benchmarks and Rust optimizations:
| **Detection Latency** | <50ms | NVIDIA T4 (TensorRT) |
| **Recognition Latency** | <20ms | NVIDIA T4 (TensorRT) |
| **End-to-End (single image)** | <100ms | NVIDIA T4 |
| **Throughput (batched)** | >100 images/sec | NVIDIA T4 |
| **CPU Latency** | <500ms | Modern multi-core CPU |
| **WASM Latency** | <1s | Browser (WebGPU) |
| **Memory Usage** | <500MB | With INT8 quantization |
### 8.7 Development Phases
**Phase 1: Core Implementation (ort + PaddleOCR)**
- Implement detection and recognition pipelines
- Integrate with ruvector-core storage
- CPU-only inference initially
- Basic preprocessing (resize, normalize)
**Phase 2: GPU Acceleration**
- Add CUDA/TensorRT support
- Benchmark and optimize performance
- Implement batching for throughput
- Memory pooling and reuse
**Phase 3: Production Hardening**
- Model quantization (INT8)
- Error handling and fallbacks
- Metrics and monitoring
- Load testing
**Phase 4: WASM Support (Optional)**
- Port to candle or tract
- Browser deployment
- WebGPU acceleration
- Client-side OCR
### 8.8 Testing Strategy
**Unit Tests**:
- Image preprocessing correctness
- Model loading and initialization
- Tensor shape validation
- Output decoding accuracy
**Integration Tests**:
```rust
#[test]
fn test_end_to_end_ocr() {
let engine = OcrEngine::new(Config::default()).unwrap();
let img = image::open("tests/fixtures/sample.jpg").unwrap();
let result = engine.recognize_text(&img).unwrap();
assert!(result.contains("expected text"));
}
```
**Benchmarks** (using Criterion):
```rust
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_detection(c: &mut Criterion) {
let engine = setup_engine();
let img = load_test_image();
c.bench_function("detection", |b| {
b.iter(|| engine.detect(black_box(&img)))
});
}
criterion_group!(benches, benchmark_detection);
criterion_main!(benches);
```
**Performance Tests**:
- Latency under various image sizes
- Throughput with batching
- Memory usage over time
- GPU utilization
---
## 9. Integration with ruvector-core Dependencies
### 9.1 Shared Workspace Dependencies
The ruvector-scipix implementation can leverage numerous existing workspace dependencies, minimizing new additions and ensuring consistency.
**Already Available (from workspace)**:
| `rayon` | Parallel distance computation | Batch image preprocessing, parallel OCR |
| `ndarray` | Vector operations | Tensor manipulation, image arrays |
| `parking_lot` | Lock-free data structures | Model pool synchronization |
| `dashmap` | Concurrent hash maps | Model cache, result cache |
| `tokio` | Async runtime | Async inference, streaming |
| `serde` / `serde_json` | Serialization | Config, results serialization |
| `thiserror` / `anyhow` | Error handling | OCR error types |
| `tracing` | Logging | Inference timing, debugging |
| `uuid` | Unique identifiers | Request tracking |
| `chrono` | Timestamps | Inference metrics |
**Benefits**:
- **Minimal new dependencies**: Only add OCR-specific crates
- **Consistent patterns**: Same error handling, logging, async across codebase
- **Binary size**: Shared dependencies not duplicated
- **Maintenance**: Updates to workspace deps benefit all crates
### 9.2 Parallel Processing Integration
**Leverage rayon for Batch OCR**:
```rust
use rayon::prelude::*;
fn process_image_batch(images: &[DynamicImage], engine: &OcrEngine) -> Vec<OcrResult> {
images.par_iter()
.map(|img| engine.recognize_text(img))
.collect()
}
```
**Consistency**: Matches ruvector-core's parallel distance computation pattern
### 9.3 Storage Integration
**Store OCR Results in ruvector-core**:
```rust
use ruvector_core::{VectorStore, Vector};
struct OcrResult {
text: String,
embedding: Vec<f32>, // From embedding model
bounding_boxes: Vec<BoundingBox>,
}
impl OcrResult {
fn store_in_ruvector(&self, store: &mut VectorStore) -> Result<uuid::Uuid> {
let vector = Vector::new(self.embedding.clone());
let id = store.insert(vector)?;
// Store metadata
store.set_metadata(id, "text", &self.text)?;
store.set_metadata(id, "boxes", &self.bounding_boxes)?;
Ok(id)
}
}
```
**Vector Search for OCR Results**:
```rust
// Find similar documents by text embedding
let query_embedding = embed_text("search query")?;
let similar_docs = store.search(&query_embedding, 10)?;
```
### 9.4 WASM Compatibility
**ruvector-core WASM Patterns**:
- `memory-only` feature for WASM targets
- `wasm-bindgen` for browser interop
- `getrandom` with `wasm_js` feature
**Apply to scipix**:
```toml
[target.'cfg(target_arch = "wasm32")'.dependencies]
candle-core = { version = "0.8", default-features = false }
wasm-bindgen = { workspace = true }
getrandom = { workspace = true, features = ["wasm_js"] }
[features]
default = ["ort-backend"]
ort-backend = ["ort"]
candle-backend = ["candle-core", "candle-onnx"]
wasm = ["candle-backend"] # WASM uses candle
```
### 9.5 Error Handling Patterns
**Consistent with ruvector-core**:
```rust
use thiserror::Error;
#[derive(Error, Debug)]
pub enum OcrError {
#[error("Model loading failed: {0}")]
ModelLoadError(String),
#[error("Inference failed: {0}")]
InferenceError(String),
#[error("Image preprocessing failed: {0}")]
PreprocessingError(#[from] image::ImageError),
#[error("ONNX Runtime error: {0}")]
OrtError(#[from] ort::Error),
#[error("IO error: {0}")]
IoError(#[from] std::io::Error),
}
pub type Result<T> = std::result::Result<T, OcrError>;
```
### 9.6 Configuration Pattern
**Similar to ruvector-core config**:
```rust
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OcrConfig {
/// Path to detection model
pub detection_model_path: String,
/// Path to recognition model
pub recognition_model_path: String,
/// Use GPU acceleration if available
pub use_gpu: bool,
/// Batch size for parallel processing
pub batch_size: usize,
/// Detection confidence threshold
pub detection_threshold: f32,
/// Number of inference threads
pub num_threads: usize,
}
impl Default for OcrConfig {
fn default() -> Self {
Self {
detection_model_path: "models/detection.onnx".into(),
recognition_model_path: "models/recognition.onnx".into(),
use_gpu: true,
batch_size: 8,
detection_threshold: 0.7,
num_threads: rayon::current_num_threads(),
}
}
}
```
### 9.7 Async Integration
**Use tokio for async OCR**:
```rust
use tokio::task;
pub struct AsyncOcrEngine {
engine: Arc<OcrEngine>,
}
impl AsyncOcrEngine {
pub async fn recognize_text(&self, image: DynamicImage) -> Result<OcrResult> {
let engine = Arc::clone(&self.engine);
// Run blocking OCR in tokio threadpool
task::spawn_blocking(move || {
engine.recognize_text_sync(&image)
}).await?
}
pub async fn process_stream(
&self,
images: impl Stream<Item = DynamicImage>,
) -> impl Stream<Item = Result<OcrResult>> {
images.then(move |img| {
let engine = Arc::clone(&self.engine);
async move {
engine.recognize_text(img).await
}
})
}
}
```
### 9.8 Metrics Integration
**Use existing tracing infrastructure**:
```rust
use tracing::{info, debug, instrument};
#[instrument(skip(self, image))]
pub fn recognize_text(&self, image: &DynamicImage) -> Result<OcrResult> {
let start = std::time::Instant::now();
debug!("Starting OCR for image {}x{}", image.width(), image.height());
let preprocessed = self.preprocess(image)?;
debug!("Preprocessing took {:?}", start.elapsed());
let boxes = self.detect(&preprocessed)?;
debug!("Detection found {} boxes in {:?}", boxes.len(), start.elapsed());
let text = self.recognize(&preprocessed, &boxes)?;
info!(
"OCR completed in {:?}, extracted {} characters",
start.elapsed(),
text.len()
);
Ok(OcrResult { text, boxes })
}
```
### 9.9 Testing Infrastructure Reuse
**Use workspace test dependencies**:
```toml
[dev-dependencies]
criterion = { workspace = true }
proptest = { workspace = true }
mockall = { workspace = true }
tempfile = "3.13"
```
**Property-Based Testing** (like ruvector-core):
```rust
use proptest::prelude::*;
proptest! {
#[test]
fn test_preprocessing_preserves_aspect_ratio(
width in 100u32..2000u32,
height in 100u32..2000u32
) {
let img = DynamicImage::new_rgb8(width, height);
let processed = preprocess_image(&img)?;
let original_ratio = width as f32 / height as f32;
let processed_ratio = processed.width() as f32 / processed.height() as f32;
prop_assert!((original_ratio - processed_ratio).abs() < 0.01);
}
}
```
### 9.10 Dependency Summary for scipix
**New Dependencies Required**:
```toml
[dependencies]
# OCR/ML (new)
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half"] }
image = "0.25"
imageproc = "0.25"
# Reuse from workspace (no version needed)
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
dashmap = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
uuid = { workspace = true }
chrono = { workspace = true }
# Integration with ruvector-core
ruvector-core = { path = "../../crates/ruvector-core" }
```
**Total New Dependencies**: 3 (ort, image, imageproc)
**Reused Dependencies**: 12 from workspace
---
## 10. License Compatibility
### 10.1 ruvector Project License
**Current License**: MIT (from workspace `Cargo.toml`)
**Requirement**: All dependencies must be MIT-compatible for redistribution.
### 10.2 Recommended Dependencies License Analysis
| **ort** | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed, fully compatible |
| **candle** | MIT OR Apache-2.0 | ✅ Yes | Hugging Face, dual-licensed |
| **tract** | MIT OR Apache-2.0 | ✅ Yes | Dual-licensed (except ONNX protos) |
| **image** | MIT OR Apache-2.0 | ✅ Yes | Pure Rust, dual-licensed |
| **imageproc** | MIT | ✅ Yes | Permissive, MIT-only |
| **ndarray** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| **rayon** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
| **wasm-bindgen** | MIT OR Apache-2.0 | ✅ Yes | Already in workspace |
**Incompatible Libraries (Avoid)**:
| **leptess** | MIT (wrapper) | ❌ Depends on Tesseract (Apache-2.0 with restrictions) |
| **opencv-rust** | MIT (wrapper) | ❌ Depends on OpenCV (Apache-2.0, complex) |
### 10.3 ONNX Model Licenses
PaddleOCR models used in ONNX format have **Apache-2.0** license.
**Compatibility**:
- ✅ Apache-2.0 code can be used in MIT-licensed projects
- ✅ ONNX models (weights) are typically considered data, not code
- ✅ Distribution of pre-trained models is permitted
- ⚠️ Derivative works of Apache-2.0 code require patent grant preservation
**Best Practice**:
- Download PaddleOCR ONNX models from official sources
- Include LICENSE file in `models/` directory
- Document model provenance in README
- Do not modify Apache-2.0 code (use as-is via ONNX)
### 10.4 Rust Dual-Licensing Best Practices
**Why Rust Uses MIT OR Apache-2.0**:
- **MIT**: Maximum permissiveness, minimal restrictions
- **Apache-2.0**: Patent protection, better for corporate use
- **Dual License**: Users choose which applies to them
**For ruvector-scipix**:
**Option 1: Keep MIT-only (Current)**
- ✅ Simplest licensing
- ✅ Maximum compatibility
- ✅ Minimal legal overhead
- ✅ All dependencies are MIT-compatible
**Option 2: Adopt Dual MIT/Apache-2.0**
- ✅ Better patent protection
- ✅ Aligns with Rust ecosystem norms
- ✅ More attractive to enterprise users
- ⚠️ Slightly more complex
**Recommendation**: Keep MIT-only for simplicity, unless patent concerns arise.
### 10.5 License Compliance Checklist
**For Production Deployment**:
- [ ] Verify all direct dependencies are MIT or MIT/Apache-2.0
- [ ] Check transitive dependencies for license conflicts
- [ ] Include LICENSE file in repository
- [ ] Document third-party licenses in NOTICE file
- [ ] Include PaddleOCR model license in `models/LICENSE`
- [ ] Add copyright headers to source files (optional for MIT)
- [ ] Review ONNX Runtime's license (MIT, but check binary distribution terms)
- [ ] Ensure no GPL/LGPL dependencies (incompatible with MIT)
**Automated License Checking**:
```bash
# Use cargo-license to audit dependencies
cargo install cargo-license
cargo license --all-features
# Fail build on incompatible licenses
cargo deny check licenses
```
**`deny.toml` Configuration**:
```toml
[licenses]
unlicensed = "deny"
allow = [
"MIT",
"Apache-2.0",
"Apache-2.0 WITH LLVM-exception",
"BSD-2-Clause",
"BSD-3-Clause",
"ISC",
"Unicode-DFS-2016",
]
deny = [
"GPL-2.0",
"GPL-3.0",
"AGPL-3.0",
]
```
### 10.6 Attribution Requirements
**MIT License Requirements**:
- Include copyright notice
- Include permission notice (LICENSE file)
- No obligation to disclose source code modifications
**For PaddleOCR Models (Apache-2.0)**:
- Include NOTICE file if provided
- Preserve copyright and patent notices
- Document significant modifications (if any)
**Recommended NOTICE File**:
```
ruvector-scipix
Copyright 2025 Ruvector Team
This software includes components from:
1. ONNX Runtime
Copyright Microsoft Corporation
Licensed under MIT License
2. PaddleOCR Models
Copyright PaddlePaddle Authors
Licensed under Apache License 2.0
Model files located in models/ directory
3. Candle ML Framework
Copyright Hugging Face, Inc.
Licensed under MIT OR Apache-2.0
Complete license texts available in the LICENSE and models/LICENSE files.
```
### 10.7 License Compatibility Summary
**✅ SAFE TO USE** (Recommended Stack):
- `ort` - MIT/Apache-2.0
- `image` - MIT/Apache-2.0
- `imageproc` - MIT
- `candle` - MIT/Apache-2.0
- `tract` - MIT/Apache-2.0
- PaddleOCR ONNX models - Apache-2.0 (data)
**⚠️ USE WITH CAUTION**:
- `leptess` - Requires Tesseract C++ library (complex licensing)
- `opencv-rust` - Requires OpenCV (large dependency, Apache-2.0)
**❌ AVOID**:
- Any GPL/LGPL libraries (incompatible with MIT for proprietary use)
- Proprietary OCR engines (licensing fees, redistribution restrictions)
**Final Recommendation**: The proposed stack (`ort` + PaddleOCR + `image`/`imageproc`) is **fully compatible** with ruvector's MIT license and follows Rust ecosystem best practices.
---
## 11. Final Recommendations
### 11.1 Optimal Technology Stack
**Primary Recommendation (Production)**:
```toml
[dependencies]
# Inference: Best performance, production-proven
ort = { version = "2.0.0-rc", features = ["cuda", "tensorrt", "half", "load-dynamic"] }
# Image processing: Pure Rust, mature
image = "0.25"
imageproc = "0.25"
# OCR models: PaddleOCR v5 ONNX (download separately)
# - Detection: ch_PP-OCRv5_mobile_det.onnx
# - Recognition: ch_PP-OCRv5_mobile_rec.onnx
# Reuse workspace dependencies
rayon = { workspace = true }
ndarray = { workspace = true }
parking_lot = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
thiserror = { workspace = true }
# Integration
ruvector-core = { path = "../../crates/ruvector-core" }
```
**Rationale**:
1. **Performance**: `ort` provides 73% latency reduction vs alternatives
2. **Ecosystem**: Largest ONNX model selection (PaddleOCR, TrOCR, etc.)
3. **GPU Support**: CUDA, TensorRT, OpenVINO, Metal (via CoreML)
4. **Production Ready**: Used by Twitter, Google, SurrealDB
5. **License**: MIT/Apache-2.0 dual-license (fully compatible)
6. **Maintenance**: Active development, Microsoft backing
### 11.2 Alternative Stacks by Use Case
**WASM/Browser Deployment**:
```toml
candle-core = { version = "0.8", features = ["wasm", "webgpu"] }
candle-onnx = "0.8"
```
- Smallest bundle size (~180KB Brotli)
- WebGPU acceleration
- Fast startup (120ms first token)
**Pure Rust / No External Deps**:
```toml
tract-onnx = "0.22"
```
- No C++ dependencies
- Excellent for embedded/restrictive environments
- 85% ONNX compatibility
**Edge Devices / Raspberry Pi**:
```toml
tract-onnx = { version = "0.22", features = ["pulse"] }
```
- Optimized for CPU inference
- Minimal memory footprint
- Proven on RPi (11μs for CNN models)
### 11.3 Implementation Roadmap
**Week 1-2: Core Infrastructure**
- Set up `examples/scipix` crate structure
- Integrate `ort` and `image`/`imageproc`
- Implement model loading (detection + recognition)
- Basic end-to-end pipeline (CPU-only)
**Week 3-4: GPU Acceleration**
- Enable CUDA/TensorRT support
- Implement batching for throughput
- Benchmark performance vs targets
- Memory pooling and optimization
**Week 5-6: Production Hardening**
- Model quantization (INT8)
- Error handling and recovery
- Metrics and monitoring (tracing)
- Integration tests and benchmarks
**Week 7-8: ruvector Integration**
- Store OCR results in ruvector-core
- Implement vector search for documents
- Async API with tokio
- Documentation and examples
**Optional (Week 9-10): WASM Support**
- Port to candle for browser deployment
- WebGPU acceleration
- Client-side OCR demo
### 11.4 Key Metrics to Track
**Performance**:
- Detection latency: Target <50ms (GPU), <200ms (CPU)
- Recognition latency: Target <20ms (GPU), <100ms (CPU)
- End-to-end: Target <100ms (GPU), <500ms (CPU)
- Throughput: Target >100 images/sec (batched, GPU)
**Memory**:
- Model size: ~15-30MB (FP32), ~5-10MB (INT8)
- Runtime memory: Target <500MB
- GPU memory: Monitor for OOM
**Accuracy**:
- Character accuracy: Target >95% (clean text)
- Word accuracy: Target >90%
- Benchmark against Tesseract and commercial APIs
### 11.5 Risk Mitigation
**Model Availability**:
- ✅ PaddleOCR models freely available
- ✅ Multiple model versions for fallback
- ⚠️ Verify ONNX export quality (may need custom conversion)
**Dependency Stability**:
- ✅ `ort` actively maintained (2.0 rc, stable release expected)
- ✅ `image`/`imageproc` mature, widely used
- ⚠️ Monitor for breaking changes during updates
**Performance Variability**:
- ⚠️ GPU performance depends on driver versions
- ⚠️ WASM performance varies by browser
- ✅ Comprehensive benchmarking before production
**License Compliance**:
- ✅ All recommended dependencies MIT-compatible
- ✅ PaddleOCR Apache-2.0 (compatible for use)
- ⚠️ Review licenses before adding new dependencies
### 11.6 Success Criteria
The ruvector-scipix implementation is successful if:
1. **Performance**: Meets or exceeds latency/throughput targets
2. **Accuracy**: Character accuracy >95% on clean text
3. **Integration**: Seamlessly stores results in ruvector-core
4. **Portability**: Runs on Linux/macOS/Windows, CPU and GPU
5. **Memory**: Operates within <500MB budget
6. **License**: Maintains MIT compatibility
7. **Maintainability**: Uses idiomatic Rust, well-documented
8. **Scalability**: Handles batch processing efficiently
### 11.7 Next Steps
1. **Review this document** with ruvector team for alignment
2. **Download PaddleOCR models** (detection + recognition ONNX)
3. **Set up `examples/scipix` crate** with recommended dependencies
4. **Implement basic OCR pipeline** (end-to-end proof of concept)
5. **Benchmark initial implementation** against targets
6. **Iterate and optimize** based on real-world data
7. **Document API** and usage examples
8. **Integrate with ruvector-core** for vector storage
---
## References and Resources
### Documentation
- [ort Documentation](https://ort.pyke.io/) - ONNX Runtime Rust bindings by pykeio
- [Candle GitHub](https://github.com/huggingface/candle) - Minimalist ML framework for Rust
- [tract GitHub](https://github.com/sonos/tract) - Tiny, no-nonsense ONNX/TF inference
- [PaddleOCR GitHub](https://github.com/PaddlePaddle/PaddleOCR) - OCR models and documentation
- [imageproc Docs](https://docs.rs/imageproc) - Rust image processing library
### Performance Benchmarks
- [Rust at the Metal: GPU Layer Driving Modern AI](https://rustacean.ai/p/issue-2-rust-at-the-metal-the-gpu-layer-driving-modern-ai)
- [Rust for Machine Learning in 2025](https://markaicode.com/rust-machine-learning-framework-comparison-2025/)
- [PaddleOCR 3.0 High-Performance Inference](http://www.paddleocr.ai/main/en/version3.x/deployment/high_performance_inference.html)
### WASM Resources
- [WebAssembly 3.0 Performance: Rust vs C++ Benchmarks](https://markaicode.com/webassembly-3-performance-rust-cpp-benchmarks-2025/)
- [3W for In-Browser AI: WebLLM + WASM + WebWorkers](https://blog.mozilla.ai/3w-for-in-browser-ai-webllm-wasm-webworkers/)
### License Information
- [Rust API Guidelines: Licensing](https://rust-lang.github.io/api-guidelines/necessities.html)
- [PaddleOCR License](https://github.com/PaddlePaddle/PaddleOCR/blob/main/LICENSE) - Apache-2.0
- [ONNX Runtime License](https://github.com/microsoft/onnxruntime/blob/main/LICENSE) - MIT
---
**Document Version**: 1.0
**Last Updated**: 2025-11-28
**Author**: Research and Analysis Agent
**Status**: Complete