oxify-connect-vision
🔍 Vision/OCR connector for OxiFY workflow automation engine
Overview
High-performance OCR (Optical Character Recognition) library supporting multiple backends with GPU acceleration, async processing, and comprehensive output formats. Designed for production workflows requiring reliable document processing at scale.
Features
- 🚀 Multiple OCR Providers: Mock (testing), Tesseract (traditional), Surya (modern), PaddleOCR (multilingual)
- ⚡ GPU Acceleration: CUDA and CoreML support via ONNX Runtime
- 🔄 Async/Await: Non-blocking processing for high throughput
- 💾 Smart Caching: Configurable LRU cache with TTL
- 📊 Rich Output: Text, Markdown, and structured JSON with bounding boxes
- 🌍 Multi-language: 100+ languages supported (provider-dependent)
- 🎯 Layout Analysis: Preserve document structure and hierarchy
- 🛡️ Production Ready: Zero warnings, comprehensive error handling
Providers Comparison
| Provider | Backend | GPU | Languages | Quality | Setup |
|---|---|---|---|---|---|
| Mock | In-memory | ❌ | Any | Low | None |
| Tesseract | leptess | ❌ | 100+ | Medium | System package |
| Surya | ONNX Runtime | ✅ | 6+ | High | ONNX models |
| PaddleOCR | ONNX Runtime | ✅ | 80+ | High | ONNX models |
Provider Details
Mock Provider
- Purpose: Testing and development
- Performance: <1ms per image
- Use Cases: Unit tests, CI/CD pipelines, demos
- Limitations: Returns placeholder text
Tesseract Provider
- Purpose: General-purpose OCR
- Performance: 200-500ms per page
- Use Cases: Printed documents, forms, simple layouts
- Strengths: Mature, widely used, no GPU required
- Limitations: Struggles with complex layouts
Surya Provider
- Purpose: Modern document understanding
- Performance: 50-300ms (GPU), 200-500ms (CPU)
- Use Cases: Complex layouts, academic papers, reports
- Strengths: Excellent layout analysis, good quality
- Requirements: ONNX detection & recognition models
PaddleOCR Provider
- Purpose: Multilingual document processing
- Performance: 60-400ms (GPU), 300-600ms (CPU)
- Use Cases: Asian languages, mixed scripts
- Strengths: 80+ languages, production-proven
- Requirements: ONNX detection, recognition, & classification models
Installation
Basic Installation
Add to your Cargo.toml:
[]
= { = "../oxify-connect-vision" }
With Specific Providers
[]
= {
path = "../oxify-connect-vision",
= ["mock", "tesseract"]
}
All Features (Development)
[]
= {
path = "../oxify-connect-vision",
= ["mock", "tesseract", "surya", "paddle", "cuda"]
}
Feature Flags
| Feature | Description | Dependencies |
|---|---|---|
mock |
Mock provider | None (default) |
tesseract |
Tesseract OCR | leptess, tesseract-sys |
surya |
Surya ONNX | ort |
paddle |
PaddleOCR ONNX | ort, ndarray |
onnx |
ONNX Runtime base | ort |
cuda |
CUDA GPU support | CUDA toolkit |
coreml |
CoreML (macOS) | CoreML |
Quick Start
1. Simple OCR with Mock Provider
use ;
async
2. Production OCR with Tesseract
use ;
async
3. GPU-Accelerated OCR with Surya
use ;
async
4. Using the Cache
use ;
use Duration;
async
CLI Usage
The Oxify CLI provides convenient commands for OCR operations:
# List available providers
# Process an image with specific provider
# Process with language specification
# Get detailed provider information
# Benchmark multiple providers
# Extract structured data
Workflow Integration
Using in JSON Workflows
Chaining with LLM Nodes
Output Formats
Text Output
Simple text extraction with whitespace preservation.
Suitable for full-text search and basic NLP.
Markdown Output
Regular text content with **formatting** preserved.
- -
JSON Output
Provider Setup
Tesseract Installation
Ubuntu/Debian:
macOS:
Windows: Download installer from: https://github.com/UB-Mannheim/tesseract/wiki
Verify:
Surya Models
- Download models from Surya releases
- Place in a directory:
models/surya/ ├── detection.onnx └── recognition.onnx - Set path in configuration:
surya
PaddleOCR Models
- Download from PaddlePaddle releases
- Structure:
models/paddle/ ├── det.onnx # Detection model ├── rec.onnx # Recognition model └── cls.onnx # Classification model - Configure:
paddle
Performance Benchmarks
Tested on: AMD Ryzen 9 5950X, NVIDIA RTX 3090, 1920x1080 images
| Provider | CPU Time | GPU Time | Memory | Accuracy* |
|---|---|---|---|---|
| Mock | <1ms | - | <1MB | N/A |
| Tesseract | 450ms | - | ~200MB | 85% |
| Surya | 320ms | 45ms | ~1.5GB | 92% |
| PaddleOCR | 380ms | 55ms | ~1.8GB | 90% |
*Accuracy measured on standard document dataset
Optimization Tips
- Enable Caching: 10-1000x speedup for repeated images
- Use GPU: 5-10x speedup for ONNX providers
- Batch Processing: Process multiple images concurrently
- Image Preprocessing: Resize large images before processing
- Choose Right Provider: Match provider capabilities to use case
Error Handling
use ;
async
Testing
Run tests:
# Unit tests (mock provider)
# Integration tests (requires setup)
# All tests with coverage
Example test:
async
Troubleshooting
"Model not loaded" Error
// Always call load_model() before processing
provider.load_model.await?;
Poor OCR Quality
- Check image quality (DPI, contrast, noise)
- Try different provider (Surya for complex layouts)
- Specify correct language
- Preprocess image (denoise, deskew)
ONNX Runtime Errors
- Verify model files are compatible ONNX format
- Check ONNX Runtime version:
cargo tree | grep ort - Ensure GPU drivers are installed (for CUDA/CoreML)
Memory Issues
- Reduce cache size:
cache.set_max_entries(100) - Process images in batches
- Resize large images before processing
Contributing
We welcome contributions! Areas of interest:
- Additional provider integrations
- Performance optimizations
- Language-specific improvements
- Documentation and examples
- Bug fixes and tests
See TODO.md for planned enhancements.
License
Apache-2.0 - See LICENSE file in the root directory.
Links
Built with ❤️ for the Oxify workflow automation platform