# Exine
Universal Markdown extraction engine for Rust.
**37+ formats. Zero external dependencies. 10–96× faster than Pandoc.**
---
## Performance
| vs Pandoc | 10–55× faster | 3.6–4.7× faster |
| vs Markitdown | 73–96× faster | 77–114× faster |
| vs html2text | 8–17× faster | — |
HTML (133 KB): **6.3ms**. DOCX (37 KB): **25ms**. Text: **near-instant**.
---
## Supported Formats
PDF, DOCX, PPTX, XLSX, ODT/ODS/ODP, EPUB, RTF, SVG,
HTML, EML, MSG, plain text — plus URL fetching and
Vision AI escalation for images (Gemini, Claude, OpenAI, Mistral).
---
## Installation
### CLI
```bash
cargo install exine
```
### Library
```toml
[dependencies]
exine = "0.1"
```
---
## Usage
### Library
```rust
use exine::extract::extract_by_extension;
let bytes = std::fs::read("report.pdf").unwrap();
let markdown = extract_by_extension("pdf", &bytes).unwrap();
```
### CLI
```bash
exine report.pdf # Extract to stdout
exine report.pdf -o output.md # Extract to file
exine https://example.com # Fetch URL and extract
exine image.png --vision gemini # Vision AI for images
```
### Findability Shield
Protects content from AI scrapers while allowing search engines.
```bash
# Generate robots.txt
exine shield --robots > robots.txt
# Deploy to CDN (S3-compatible) with content
exine shield \
--robots \
--s3-bucket my-bucket \
--s3-region us-east-1 \
--content-dir ./output
```
### Web Scraping
Stealth scraping with CAPTCHA solving and pagination.
```bash
# Crawl with stealth and pagination
exine crawl "https://example.com" --stealth --depth 2
# Crawl with CAPTCHA escalation
exine crawl "https://site.com" --captcha --render
```
---
## Vision AI (Optional)
For scanned PDFs and images, Exine escalates to Vision AI:
```bash
export GEMINI_API_KEY=...
exine scanned.pdf --vision auto # Auto-selects best available provider
```
Supported providers: `gemini`, `claude`, `openai`, `mistral`
---
## Feature Flags
| `dashboard` | ✅ | Axum web dashboard |
| `ocr` | ❌ | Tesseract OCR (requires `libtesseract`) |
| `stt` | ❌ | Whisper.cpp STT (requires model file) |
| `vision` | ❌ | Vision AI extraction |
| `headless` | ❌ | Headless Chrome via chromiumoxide |
---
## Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
---
## Built by NMA
Exine powers the FIELD ecosystem (GRID + SCALAR + STRIA) —
AI-native tools for European and Israeli startup fundraising.
https://nma.vc
---
## License
[MIT](LICENSE)