1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
//! # edgequake-pdf2md
//!
//! Convert PDF documents to Markdown using Vision Language Models (VLMs).
//!
//! ## Why this crate?
//!
//! Traditional PDF-to-text tools (pdftotext, pdf-extract) fail on complex
//! layouts — multi-column text, mathematical symbols, figures, and tables come
//! out garbled or out of reading order. Instead this crate rasterises each page
//! into a PNG and lets a VLM read it as a human would, producing semantically
//! correct Markdown that preserves structure, tables, and formulae.
//!
//! ## Pipeline Overview
//!
//! ```text
//! PDF
//! │
//! ├─ 1. Input resolve local file or download from URL
//! ├─ 2. Render rasterise pages via pdfium (CPU-bound, spawn_blocking)
//! ├─ 3. Encode PNG → base64 ImageData
//! ├─ 4. VLM concurrent calls to gpt-4.1-nano / claude / gemini / …
//! ├─ 5. Polish 10-rule post-processing (fences, tables, whitespace)
//! └─ 6. Output assembled Markdown + per-page stats
//! ```
//!
//! ## Quick Start
//!
//! ```rust,no_run
//! use edgequake_pdf2md::{convert, ConversionConfig};
//!
//! #[tokio::main]
//! async fn main() -> Result<(), Box<dyn std::error::Error>> {
//! // Provider auto-detected from OPENAI_API_KEY / ANTHROPIC_API_KEY / GEMINI_API_KEY
//! let config = ConversionConfig::default();
//! let output = convert("document.pdf", &config).await?;
//! println!("{}", output.markdown);
//! eprintln!("tokens: {} in / {} out",
//! output.stats.total_input_tokens,
//! output.stats.total_output_tokens);
//! Ok(())
//! }
//! ```
//!
//! ## Feature Flags
//!
//! | Feature | Default | Description |
//! |---------|---------|-------------|
//! | `cli` | on | Enables the `pdf2md` binary (clap + anyhow + tracing-subscriber) |
//!
//! Disable `cli` when using only the library to avoid pulling in CLI-only deps:
//! ```toml
//! edgequake-pdf2md = { version = "0.1", default-features = false }
//! ```
//!
//! ## Choosing a Model
//!
//! | Model | $/1M tokens | Quality | Best for |
//! |-------|------------|---------|----------|
//! | `gpt-4.1-nano` | $0.10/$0.40 | ★★★ | Default — fast, cheap |
//! | `gpt-4.1-mini` | $0.40/$1.60 | ★★★★ | Balance |
//! | `gpt-4.1` | $2.00/$8.00 | ★★★★★ | Highest accuracy |
//! | `claude-sonnet-4-20250514` | $3.00/$15.00 | ★★★★★ | Tables, complex layouts |
//! | `gemini-2.0-flash` | $0.10/$0.40 | ★★★ | Alternative cheap option |
//!
//! A 50-page document costs roughly **$0.02** with `gpt-4.1-nano`.
// ── Modules ──────────────────────────────────────────────────────────────
// ── Re-exports ───────────────────────────────────────────────────────────
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;
pub use ;