SciPix - Rust OCR Engine for Scientific Documents & Math Equations
Why SciPix?
SciPix is a blazing-fast, memory-safe OCR (Optical Character Recognition) engine written in pure Rust. Unlike traditional OCR tools, SciPix is purpose-built for scientific documents, mathematical equations, and technical diagrams โ making it the ideal choice for researchers, academics, and developers working with STEM content.
Use Cases
- ๐ Academic Paper Digitization - Extract text and equations from scanned research papers
- ๐งฎ Math Homework Assistance - Convert handwritten equations to LaTeX for AI tutoring apps
- ๐ Technical Documentation - Process engineering diagrams and scientific charts
- ๐ฌ Research Data Extraction - Batch process journal articles and extract structured data
- ๐ค AI/LLM Integration - Feed scientific content to language models via MCP protocol
Key Features
| Feature | Description |
|---|---|
| ๐ ONNX Runtime | GPU-accelerated neural network inference with CUDA, TensorRT, and CoreML support |
| ๐ LaTeX Output | Accurate mathematical equation recognition with LaTeX, MathML, and AsciiMath export |
| โก SIMD Optimized | 4x faster image preprocessing with AVX2, SSE4, and NEON vectorization |
| ๐ REST API | Production-ready HTTP server with rate limiting, caching, and authentication |
| ๐ป CLI Tool | Batch processing, PDF conversion, and watch mode for continuous OCR |
| ๐ฆ Pure Rust SDK | Type-safe, async/await native library with zero-copy image processing |
| ๐ WebAssembly | Run OCR directly in browsers with full WASM support |
| ๐ค MCP Server | Integrate with Claude, ChatGPT, and other AI assistants via Model Context Protocol |
| ๐ฆ Cross-Platform | Linux, macOS, Windows, and ARM64 support out of the box |
Performance Benchmarks
| Operation | SciPix | Tesseract | Mathpix |
|---|---|---|---|
| Simple Text OCR | 50ms | 120ms | 200ms* |
| Math Equation | 80ms | N/A | 150ms* |
| Batch (100 images) | 2.1s | 8.5s | N/A |
| Memory Usage | 45MB | 180MB | Cloud |
*API latency, not processing time
Installation
From crates.io (Rust SDK)
Or add to your Cargo.toml:
[]
= "0.1.16"
# With specific features
= { = "0.1.16", = ["ocr", "math", "optimize"] }
From Source (CLI & Server)
# Clone the repository
# Build CLI and Server
# Install globally (optional)
Pre-built Binaries
# Download latest release (Linux)
# Download latest release (macOS)
Feature Flags
| Flag | Description | Default |
|---|---|---|
default |
preprocess, cache, optimize | โ |
ocr |
ONNX-based OCR engine | โ |
math |
Math expression parsing | โ |
preprocess |
Image preprocessing | โ |
cache |
Result caching | โ |
optimize |
SIMD & parallel optimizations | โ |
wasm |
WebAssembly support | โ |
Quick Start
30-Second Setup
# Build and run the server
# In another terminal, test the API
# {"status":"healthy","version":"0.1.16"}
Process Your First Image
# Encode an image to base64
BASE64_IMAGE=
# Send OCR request
SDK Usage
Basic Usage
use ;
Image Preprocessing
use ;
use open;
OCR Engine (requires ocr feature)
use ;
use OcrConfig;
async
Math Parsing (requires math feature)
use ;
Caching Results
use CacheManager;
use CacheConfig;
Configuration Presets
use ;
CLI Reference
Installation
# Install from source
# Or use pre-built binary
Commands
ocr - Process Single Image
# Basic OCR
# With output file and format
# Specify output formats
Options:
| Flag | Description | Default |
|---|---|---|
-i, --input |
Input image path | Required |
-o, --output |
Output file path | stdout |
-f, --format |
Output format (json, text, latex) | json |
--formats |
OCR formats (text, latex, mathml, html) | text |
--confidence |
Minimum confidence threshold | 0.5 |
batch - Process Multiple Images
# Process directory
# With parallel processing
# Recursive with specific formats
# Watch mode for continuous processing
Options:
| Flag | Description | Default |
|---|---|---|
-i, --input-dir |
Input directory | Required |
-o, --output-dir |
Output directory | Required |
-p, --parallel |
Parallel workers | CPU cores |
-r, --recursive |
Process subdirectories | false |
--watch |
Watch for new files | false |
--max-retries |
Retry failed files | 3 |
serve - Start API Server
# Start with defaults
# Custom address and port
# With configuration file
# Enable debug logging
RUST_LOG=debug
Options:
| Flag | Description | Default |
|---|---|---|
-a, --address |
Bind address | 127.0.0.1 |
-p, --port |
Port number | 3000 |
-c, --config |
Config file path | None |
--workers |
Worker threads | CPU cores |
config - Manage Configuration
# Show current configuration
# Initialize default config file
# Set specific values
# Validate configuration
doctor - Environment Check
# Run full diagnostics
# Check specific components
# Output as JSON
# Auto-fix issues
Checks performed:
- CPU cores and SIMD capabilities (SSE2, AVX, AVX2, AVX-512, NEON)
- Memory availability
- ONNX Runtime installation
- Model file availability
- Configuration validity
- Network port availability
mcp - MCP Server Mode
# Start MCP server for AI integration
# With debug logging
# With custom models directory
Available MCP Tools:
| Tool | Description |
|---|---|
ocr_image |
Process image file with OCR |
ocr_base64 |
Process base64-encoded image |
batch_ocr |
Batch process multiple images |
preprocess_image |
Apply image preprocessing |
latex_to_mathml |
Convert LaTeX to MathML |
benchmark_performance |
Run performance benchmarks |
Claude Code Integration:
Tutorials
Tutorial 1: Basic Image OCR
Learn to extract text from images using the REST API.
# Step 1: Start the server
# Step 2: Encode your image
BASE64=
# Step 3: Send OCR request
Tutorial 2: Mathematical Equation Recognition
Convert math images to LaTeX format.
Response:
Tutorial 3: Batch PDF Processing
Process multi-page PDFs asynchronously.
# Submit PDF job
JOB=
JOB_ID=
# Poll for completion
Tutorial 4: CLI Batch Processing
# Process entire directory
# Watch mode for continuous processing
Tutorial 5: WebAssembly Integration
# Build WASM module
Tutorial 6: Using as MCP Server
Integrate SciPix with Claude Code or other AI assistants.
# Add to Claude Code
# Or run standalone
Then use tools in your AI conversations:
- "Use the ocr_image tool to extract text from ./screenshot.png"
- "Convert this LaTeX to MathML: \frac{1}{2}"
API Reference
Authentication
All API endpoints (except /health) require authentication:
app_id: your_application_id
app_key: your_secret_key
Endpoints
POST /v3/text - Image OCR
POST /v3/strokes - Digital Ink
POST /v3/pdf - PDF Processing
GET /health - Health Check
Configuration
Environment Variables
SERVER_ADDR=127.0.0.1:3000
RUST_LOG=scipix=info
RATE_LIMIT_PER_MINUTE=100
CACHE_MAX_SIZE=1000
MODEL_PATH=./models
Configuration File
[]
= "127.0.0.1"
= 3000
= 4
[]
= "./models"
= 0.5
[]
= 1000
= 3600
[]
= 100
= 20
Performance
| Operation | Time (avg) | Throughput |
|---|---|---|
| SIMD Grayscale | 101ยตs | 4.2x faster |
| SIMD Resize | 2.63ms | 1.5x faster |
| Full Pipeline | 0.49ms | 4.4x faster |
| Simple text OCR | ~50ms | 20 img/s |
| Math equation | ~80ms | 12 img/s |
Troubleshooting
# Check environment
# Enable debug logging
RUST_LOG=debug
# Verify models installed
Contributing
# Run tests
# Run linting
# Format code
License
MIT License - see LICENSE for details.