edgequake-pdf2md is a Rust CLI and library that converts PDF files (local or URL) into well-structured Markdown using vision-capable LLMs. It rasterises each page with pdfium, sends the image to a VLM (GPT-4.1, Claude, Gemini, etc.), and post-processes the result into clean Markdown.
Inspired by pyzerox, rebuilt in Rust for speed and reliability.
Features
- Multi-provider — OpenAI, Anthropic, Google Gemini, Azure, Ollama, or any OpenAI-compatible endpoint
- Fast — concurrent page processing with configurable parallelism
- Accurate — 10-rule post-processing pipeline fixes tables, removes hallucinations, normalises output
- Flexible — page selection, fidelity tiers, custom system prompts, streaming API
- Cross-platform — macOS (ARM/x64), Linux (x64/ARM64/musl), Windows
- Library + CLI — use as a Rust crate or standalone command-line tool
Quick Start
1. Install pdfium
# Auto-detect OS & architecture
# macOS: set library path
# Linux: set library path
See docs/installation.md for manual install options (Homebrew, apt, manual download).
2. Set an API key
# OpenAI (recommended)
# or
# Anthropic
# or
# Google Gemini
3. Build & run
# Convert a PDF
# Convert from URL
# Inspect metadata (no API key needed)
Or install globally:
How It Works
PDF ──▶ pdfium ──▶ PNG images ──▶ base64 ──▶ VLM API ──▶ post-process ──▶ Markdown
render per page encode (concurrent) 10 rules assembled
- Input — resolve local file or download from URL
- Render — rasterise pages to images via pdfium-render
- Encode — base64-encode each page image
- VLM — send images to a vision LLM with a structured system prompt
- Post-process — strip fences, fix tables, remove hallucinated images, normalise whitespace
- Assemble — join pages with optional separators and YAML front-matter
See docs/how-it-works.md for the full pipeline walkthrough with diagrams.
Usage
# Basic conversion
# Specific pages
# High fidelity with a better model
# Consistent formatting across pages (sequential mode)
# JSON output with metadata
# Use Anthropic
# Use local Ollama
Run pdf2md --help for the full reference, including supported models and cost estimates.
See docs/examples.md for more usage patterns.
Supported Providers & Models
| Provider | Model | Input $/1M | Output $/1M | Vision |
|---|---|---|---|---|
| OpenAI | gpt-4.1-nano (default) | $0.10 | $0.40 | ✓ |
| OpenAI | gpt-4.1-mini | $0.40 | $1.60 | ✓ |
| OpenAI | gpt-4.1 | $2.00 | $8.00 | ✓ |
| Anthropic | claude-sonnet-4-20250514 | $3.00 | $15.00 | ✓ |
| Anthropic | claude-haiku-4-20250514 | $0.80 | $4.00 | ✓ |
| Gemini | gemini-2.0-flash | $0.10 | $0.40 | ✓ |
| Gemini | gemini-2.5-pro | $1.25 | $10.00 | ✓ |
| Ollama | llava, llama3.2-vision | free | free | ✓ |
Cost estimate: A 50-page document costs ~$0.02 with gpt-4.1-nano, ~$0.09 with gpt-4.1-mini.
See docs/providers.md for detailed comparisons, cost calculators, and selection guide.
Library Usage
Add to your Cargo.toml:
[]
= "0.1"
= { = "1", = ["full"] }
use ;
async
Also available: streaming API (convert_stream), sync wrapper (convert_sync), metadata inspection (inspect).
See API docs on docs.rs for the full API reference.
Configuration
All options can be set via CLI flags, environment variables, or the builder API:
| Flag | Env Variable | Default | Description |
|---|---|---|---|
--model |
EDGEQUAKE_MODEL |
gpt-4.1-nano | VLM model |
--provider |
EDGEQUAKE_PROVIDER |
auto-detect | LLM provider |
--dpi |
PDF2MD_DPI |
150 | Rendering resolution (72–400) |
--pages |
PDF2MD_PAGES |
all | Page selection |
--fidelity |
PDF2MD_FIDELITY |
tier2 | Quality tier (tier1/tier2/tier3) |
-c, --concurrency |
PDF2MD_CONCURRENCY |
10 | Parallel API calls |
--maintain-format |
PDF2MD_MAINTAIN_FORMAT |
false | Sequential mode |
--separator |
PDF2MD_SEPARATOR |
none | Page separator |
--temperature |
PDF2MD_TEMPERATURE |
0.1 | LLM temperature |
See docs/configuration.md for the complete reference.
Development
# Setup
# Build
# Test
# Quality
# Try it
Documentation
| Document | Description |
|---|---|
| docs/how-it-works.md | Pipeline architecture with ASCII diagrams |
| docs/installation.md | Setup guide for all platforms |
| docs/providers.md | Supported models, pricing, selection guide |
| docs/configuration.md | All CLI flags and environment variables |
| docs/examples.md | Real-world usage examples |
Dependencies
| Crate | Purpose |
|---|---|
| pdfium-render | PDF rasterisation via Google's pdfium C++ library |
| edgequake-llm | Multi-provider LLM abstraction (OpenAI, Anthropic, Gemini, etc.) |
| tokio | Async runtime |
| image | Image encoding (PNG/JPEG) |
| clap | CLI argument parsing |
External References
- pdfium — Google's open-source PDF rendering engine
- pdfium-binaries — Pre-built pdfium binaries for all platforms
- pyzerox — The Python project that inspired this tool
- OpenAI Vision API — Image understanding with GPT-4.1
- Anthropic Vision — Image understanding with Claude
- Google Gemini — Vision capabilities
License
Copyright 2026 Raphaël MANSUY
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
See LICENSE for the full text.