Exine
Universal Markdown extraction engine for Rust.
37+ formats. Zero external dependencies. 10–96× faster than Pandoc.
Performance
| vs Competitor | HTML Speed | Text Speed |
|---|---|---|
| vs Pandoc | 10–55× faster | 3.6–4.7× faster |
| vs Markitdown | 73–96× faster | 77–114× faster |
| vs html2text | 8–17× faster | — |
HTML (133 KB): 6.3ms. DOCX (37 KB): 25ms. Text: near-instant.
Supported Formats
PDF, DOCX, PPTX, XLSX, ODT/ODS/ODP, EPUB, RTF, SVG, HTML, EML, MSG, plain text — plus URL fetching and Vision AI escalation for images (Gemini, Claude, OpenAI, Mistral).
Installation
CLI
Library
[]
= "0.1"
Usage
Library
use extract_by_extension;
let bytes = read.unwrap;
let markdown = extract_by_extension.unwrap;
CLI
Findability Shield
Protects content from AI scrapers while allowing search engines.
# Generate robots.txt
# Deploy to CDN (S3-compatible) with content
Web Scraping
Stealth scraping with CAPTCHA solving and pagination.
# Crawl with stealth and pagination
# Crawl with CAPTCHA escalation
Vision AI (Optional)
For scanned PDFs and images, Exine escalates to Vision AI:
Supported providers: gemini, claude, openai, mistral
Feature Flags
| Flag | Default | Description |
|---|---|---|
dashboard |
✅ | Axum web dashboard |
ocr |
❌ | Tesseract OCR (requires libtesseract) |
stt |
❌ | Whisper.cpp STT (requires model file) |
vision |
❌ | Vision AI extraction |
headless |
❌ | Headless Chrome via chromiumoxide |
Contributing
See CONTRIBUTING.md for guidelines.
Built by NMA
Exine powers the FIELD ecosystem (GRID + SCALAR + STRIA) — AI-native tools for European and Israeli startup fundraising. https://nma.vc