edgequake-pdf2md 0.4.1

Convert PDF documents to Markdown using Vision Language Models — CLI and library
docs.rs failed to build edgequake-pdf2md-0.4.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: edgequake-pdf2md-0.3.1

edgequake-pdf2md is a Rust CLI and library that converts PDF files (local or URL) into well-structured Markdown using vision-capable LLMs. It rasterises each page with pdfium, sends the image to a VLM (GPT-4.1, Claude, Gemini, etc.), and post-processes the result into clean Markdown.

Inspired by pyzerox, rebuilt in Rust for speed and reliability.

Features

  • Multi-provider — OpenAI, Anthropic, Google Gemini, Azure, Ollama, or any OpenAI-compatible endpoint
  • Fast — concurrent page processing with configurable parallelism
  • Accurate — 10-rule post-processing pipeline fixes tables, removes hallucinations, normalises output
  • Flexible — page selection, fidelity tiers, custom system prompts, streaming API
  • Self-contained — pdfium (~5 MB) embedded in the binary by default; no runtime downloads, no env vars
  • Cross-platform — macOS (arm64/x64), Linux (x64/aarch64), Windows (x64/arm64)
  • Library + CLI — use as a Rust crate or standalone command-line tool

Quick Start

Self-contained binary — zero runtime setup. Starting from v0.4.0, the PDFium engine (~5 MB) is embedded inside the binary at compile time (bundled feature is now the default). No download required at runtime, no DYLD_LIBRARY_PATH, no environment variables needed.

Build-time auto-download: if you don't set PDFIUM_BUNDLE_LIB, the correct pdfium library is downloaded automatically during cargo build and cached in ~/.cargo/pdfium-bundle/. Use PDFIUM_LIB_PATH to point to an existing copy at runtime (download mode, without bundled feature).

1. Set an API key

export OPENAI_API_KEY="sk-..."    # OpenAI (recommended)
# or
export ANTHROPIC_API_KEY="sk-ant-..."  # Anthropic
# or
export GEMINI_API_KEY="AI..."          # Google Gemini

2. Build & run

cargo build --release

# Convert a PDF
./target/release/pdf2md document.pdf -o output.md

# Convert from URL
./target/release/pdf2md https://arxiv.org/pdf/1706.03762 -o paper.md

# Inspect metadata (no API key needed)
./target/release/pdf2md --inspect-only document.pdf

Or install globally:

cargo install edgequake-pdf2md
pdf2md document.pdf -o output.md

How It Works

PDF ──▶ pdfium ──▶ PNG images ──▶ base64 ──▶ VLM API ──▶ post-process ──▶ Markdown
        render      per page       encode     (concurrent)   10 rules       assembled
  1. Input — resolve local file or download from URL
  2. Render — rasterise pages to images via pdfium-render
  3. Encode — base64-encode each page image
  4. VLM — send images to a vision LLM with a structured system prompt
  5. Post-process — strip fences, fix tables, remove hallucinated images, normalise whitespace
  6. Assemble — join pages with optional separators and YAML front-matter

See docs/how-it-works.md for the full pipeline walkthrough with diagrams.

Usage

# Basic conversion
pdf2md document.pdf -o output.md

# Specific pages
pdf2md --pages 1-5 document.pdf -o first_five.md

# High fidelity with a better model
pdf2md --fidelity tier3 --model gpt-4.1 --dpi 200 paper.pdf -o paper.md

# Consistent formatting across pages (sequential mode)
pdf2md --maintain-format --separator hr book.pdf -o book.md

# JSON output with metadata
pdf2md --json --metadata document.pdf > output.json

# Use Anthropic
pdf2md --provider anthropic --model claude-sonnet-4-20250514 document.pdf

# Use local Ollama
pdf2md --provider ollama --model llava document.pdf

Run pdf2md --help for the full reference, including supported models and cost estimates.

See docs/examples.md for more usage patterns.

Supported Providers & Models

Provider Model Input $/1M Output $/1M Vision
OpenAI gpt-4.1-nano (default) $0.10 $0.40
OpenAI gpt-4.1-mini $0.40 $1.60
OpenAI gpt-4.1 $2.00 $8.00
Anthropic claude-sonnet-4-20250514 $3.00 $15.00
Anthropic claude-haiku-4-20250514 $0.80 $4.00
Gemini gemini-2.0-flash $0.10 $0.40
Gemini gemini-2.5-pro $1.25 $10.00
Ollama llava, llama3.2-vision free free

Cost estimate: A 50-page document costs ~$0.02 with gpt-4.1-nano, ~$0.09 with gpt-4.1-mini.

See docs/providers.md for detailed comparisons, cost calculators, and selection guide.

Library Usage

Add to your Cargo.toml:

[dependencies]
edgequake-pdf2md = "0.4"
tokio = { version = "1", features = ["full"] }

Basic conversion

use edgequake_pdf2md::{convert, ConversionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ConversionConfig::builder()
        .model("gpt-4.1-nano")
        .provider_name("openai")
        .pages(edgequake_pdf2md::PageSelection::Range(1, 5))
        .build()?;

    let output = convert("document.pdf", &config).await?;
    println!("{}", output.markdown);
    println!("Processed {}/{} pages", output.stats.processed_pages, output.stats.total_pages);
    Ok(())
}

Convert PDF bytes in memory (v0.2)

No temp-file management needed — pass raw bytes directly:

use edgequake_pdf2md::{convert_from_bytes, ConversionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let bytes = std::fs::read("document.pdf")?;  // or from DB / network
    let config = ConversionConfig::default();
    let output = convert_from_bytes(&bytes, &config).await?;
    println!("{}", output.markdown);
    Ok(())
}

Per-page progress callbacks (v0.2)

use edgequake_pdf2md::{convert, ConversionConfig, ConversionProgressCallback};
use std::sync::Arc;

struct MyProgress;

impl ConversionProgressCallback for MyProgress {
    fn on_conversion_start(&self, total: usize) {
        eprintln!("Starting conversion of {total} pages");
    }
    fn on_page_complete(&self, page: usize, total: usize, chars: usize) {
        eprintln!("  ✓ Page {page}/{total}{chars} chars");
    }
    fn on_page_error(&self, page: usize, total: usize, error: &str) {
        eprintln!("  ✗ Page {page}/{total} failed: {error}");
    }
    fn on_conversion_complete(&self, total: usize, success: usize) {
        eprintln!("Done: {success}/{total} pages converted");
    }
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ConversionConfig::builder()
        .progress_callback(Arc::new(MyProgress) as Arc<dyn ConversionProgressCallback>)
        .build()?;
    let output = convert("document.pdf", &config).await?;
    println!("{}", output.markdown);
    Ok(())
}

Strict error on partial failure (v0.2)

By default, page failures are non-fatal. Use into_result() to promote them to errors:

use edgequake_pdf2md::{convert, ConversionConfig};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ConversionConfig::default();
    // into_result() returns Err(PartialFailure) if any pages failed
    let output = convert("document.pdf", &config).await?.into_result()?;
    println!("{}", output.markdown);
    Ok(())
}

Provider injection (v0.2)

Pass a pre-built Arc<dyn LLMProvider> directly — useful for sharing providers across multiple conversions and for testing with mocks:

use edgequake_pdf2md::{convert, ConversionConfig};
use edgequake_llm::ProviderFactory;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (provider, _) = ProviderFactory::from_env()?;
    let config = ConversionConfig::builder()
        .provider(Arc::clone(&provider))   // injected; highest priority
        .build()?;
    let output = convert("document.pdf", &config).await?;
    println!("{}", output.markdown);
    Ok(())
}

Provider resolution order (highest-to-lowest priority):

  1. config.provider — explicit Arc<dyn LLMProvider> injection
  2. config.provider_name + config.model — named provider
  3. EDGEQUAKE_LLM_PROVIDER + EDGEQUAKE_MODEL environment variables
  4. Auto-detect from API key env vars (OPENAI_API_KEY, ANTHROPIC_API_KEY, …)

Also available: streaming API (convert_stream, convert_stream_from_bytes), sync wrapper (convert_sync), metadata inspection (inspect).

See API docs on docs.rs for the full API reference.

Configuration

All options can be set via CLI flags, environment variables, or the builder API:

Flag Env Variable Default Description
--model EDGEQUAKE_MODEL gpt-4.1-nano VLM model
--provider EDGEQUAKE_PROVIDER auto-detect LLM provider
--dpi PDF2MD_DPI 150 Rendering resolution (72–400)
--pages PDF2MD_PAGES all Page selection
--fidelity PDF2MD_FIDELITY tier2 Quality tier (tier1/tier2/tier3)
-c, --concurrency PDF2MD_CONCURRENCY 10 Parallel API calls
--maintain-format PDF2MD_MAINTAIN_FORMAT false Sequential mode
--separator PDF2MD_SEPARATOR none Page separator
--temperature PDF2MD_TEMPERATURE 0.1 LLM temperature

See docs/configuration.md for the complete reference.

Development

# Setup
make setup          # Check pdfium + API key

# Build
make build          # Release binary
make build-dev      # Debug binary

# Test
make test           # Unit tests (no API key needed)
make test-e2e       # Integration tests (needs API key)
make test-all       # All tests

# Quality
make lint           # Clippy
make fmt            # Format code
make ci             # format + lint + unit tests

# Try it
make demo           # Convert sample page
make inspect-all    # Inspect test PDFs

Documentation

Document Description
docs/how-it-works.md Pipeline architecture with ASCII diagrams
docs/installation.md Setup guide for all platforms
docs/providers.md Supported models, pricing, selection guide
docs/configuration.md All CLI flags and environment variables
docs/examples.md Real-world usage examples

Dependencies

Crate Purpose
pdfium-render PDF rasterisation via Google's pdfium C++ library
edgequake-llm Multi-provider LLM abstraction (OpenAI, Anthropic, Gemini, etc.)
tokio Async runtime
image Image encoding (PNG/JPEG)
clap CLI argument parsing

External References

License

Copyright 2026 Raphaël MANSUY

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

See LICENSE for the full text.