exine 0.1.1

Universal Markdown extraction engine. 37+ formats, zero external dependencies, 10-96× faster than Pandoc.
Documentation
# Exine

Universal Markdown extraction engine for Rust.

**37+ formats. Zero external dependencies. 10–96× faster than Pandoc.**

---

## Performance

| vs Competitor   | HTML Speed     | Text Speed      |
|-----------------|----------------|-----------------|
| vs Pandoc       | 10–55× faster  | 3.6–4.7× faster |
| vs Markitdown   | 73–96× faster  | 77–114× faster  |
| vs html2text    | 8–17× faster   ||

HTML (133 KB): **6.3ms**. DOCX (37 KB): **25ms**. Text: **near-instant**.

---

## Supported Formats

PDF, DOCX, PPTX, XLSX, ODT/ODS/ODP, EPUB, RTF, SVG,
HTML, EML, MSG, plain text — plus URL fetching and
Vision AI escalation for images (Gemini, Claude, OpenAI, Mistral).

---

## Installation

### CLI

```bash
cargo install exine
```

### Library

```toml
[dependencies]
exine = "0.1"
```

---

## Usage

### Library

```rust
use exine::extract::extract_by_extension;

let bytes = std::fs::read("report.pdf").unwrap();
let markdown = extract_by_extension("pdf", &bytes).unwrap();
```

### CLI

```bash
exine report.pdf                    # Extract to stdout
exine report.pdf -o output.md       # Extract to file
exine https://example.com           # Fetch URL and extract
exine image.png --vision gemini     # Vision AI for images
```

### Findability Shield

Protects content from AI scrapers while allowing search engines.

```bash
# Generate robots.txt
exine shield --robots > robots.txt

# Deploy to CDN (S3-compatible) with content
exine shield \
  --robots \
  --s3-bucket my-bucket \
  --s3-region us-east-1 \
  --content-dir ./output
```

### Web Scraping

Stealth scraping with CAPTCHA solving and pagination.

```bash
# Crawl with stealth and pagination
exine crawl "https://example.com" --stealth --depth 2

# Crawl with CAPTCHA escalation
exine crawl "https://site.com" --captcha --render
```

---

## Vision AI (Optional)

For scanned PDFs and images, Exine escalates to Vision AI:

```bash
export GEMINI_API_KEY=...
exine scanned.pdf --vision auto     # Auto-selects best available provider
```

Supported providers: `gemini`, `claude`, `openai`, `mistral`

---

## Feature Flags

| Flag | Default | Description |
|------|---------|-------------|
| `dashboard` || Axum web dashboard |
| `ocr` || Tesseract OCR (requires `libtesseract`) |
| `stt` || Whisper.cpp STT (requires model file) |
| `vision` || Vision AI extraction |
| `headless` || Headless Chrome via chromiumoxide |

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

---

## Built by NMA

Exine powers the FIELD ecosystem (GRID + SCALAR + STRIA) —
AI-native tools for European and Israeli startup fundraising.
https://nma.vc

---

## License

[MIT](LICENSE)