exine 0.1.1

Universal Markdown extraction engine. 37+ formats, zero external dependencies, 10-96× faster than Pandoc.
Documentation

Exine

Universal Markdown extraction engine for Rust.

37+ formats. Zero external dependencies. 10–96× faster than Pandoc.


Performance

vs Competitor HTML Speed Text Speed
vs Pandoc 10–55× faster 3.6–4.7× faster
vs Markitdown 73–96× faster 77–114× faster
vs html2text 8–17× faster

HTML (133 KB): 6.3ms. DOCX (37 KB): 25ms. Text: near-instant.


Supported Formats

PDF, DOCX, PPTX, XLSX, ODT/ODS/ODP, EPUB, RTF, SVG, HTML, EML, MSG, plain text — plus URL fetching and Vision AI escalation for images (Gemini, Claude, OpenAI, Mistral).


Installation

CLI

cargo install exine

Library

[dependencies]
exine = "0.1"

Usage

Library

use exine::extract::extract_by_extension;

let bytes = std::fs::read("report.pdf").unwrap();
let markdown = extract_by_extension("pdf", &bytes).unwrap();

CLI

exine report.pdf                    # Extract to stdout
exine report.pdf -o output.md       # Extract to file
exine https://example.com           # Fetch URL and extract
exine image.png --vision gemini     # Vision AI for images

Findability Shield

Protects content from AI scrapers while allowing search engines.

# Generate robots.txt
exine shield --robots > robots.txt

# Deploy to CDN (S3-compatible) with content
exine shield \
  --robots \
  --s3-bucket my-bucket \
  --s3-region us-east-1 \
  --content-dir ./output

Web Scraping

Stealth scraping with CAPTCHA solving and pagination.

# Crawl with stealth and pagination
exine crawl "https://example.com" --stealth --depth 2

# Crawl with CAPTCHA escalation
exine crawl "https://site.com" --captcha --render

Vision AI (Optional)

For scanned PDFs and images, Exine escalates to Vision AI:

export GEMINI_API_KEY=...
exine scanned.pdf --vision auto     # Auto-selects best available provider

Supported providers: gemini, claude, openai, mistral


Feature Flags

Flag Default Description
dashboard Axum web dashboard
ocr Tesseract OCR (requires libtesseract)
stt Whisper.cpp STT (requires model file)
vision Vision AI extraction
headless Headless Chrome via chromiumoxide

Contributing

See CONTRIBUTING.md for guidelines.


Built by NMA

Exine powers the FIELD ecosystem (GRID + SCALAR + STRIA) — AI-native tools for European and Israeli startup fundraising. https://nma.vc


License

MIT