pdf-oxide-mcp — PDF Extraction MCP Server for AI Assistants
An MCP (Model Context Protocol) server that gives AI assistants like Claude, Cursor, and GitHub Copilot the ability to extract text, markdown, and HTML from PDF files. Powered by pdf_oxide, the fastest Rust PDF library. All processing runs locally — no files leave your machine.
Install
Configure Your AI Assistant
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
Claude Code
Add to your project's .mcp.json or global settings:
Cursor
Add to Cursor's MCP configuration:
npx (no install required)
Tools
The server exposes an extract tool with the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
file_path |
string | yes | Path to the PDF file |
output_path |
string | yes | Path to write extracted content |
format |
string | no | text (default), markdown, or html |
pages |
string | no | Page range, e.g. "1-3,7,10-12" |
password |
string | no | Password for encrypted PDFs |
images |
boolean | no | Extract images to files alongside output |
embed_images |
boolean | no | Embed images as base64 data URIs (default: true) |
How It Works
pdf-oxide-mcp implements the Model Context Protocol over stdin/stdout using JSON-RPC. When an AI assistant needs to read a PDF, it calls the extract tool with the file path and desired format. The server processes the PDF locally using the pdf_oxide library and returns the extracted content.
- Text — plain text extraction preserving reading order
- Markdown — structured output with headings, lists, and column-aware layout
- HTML — formatted HTML output
- Images — optional image extraction as separate files or embedded base64
Use Cases
- RAG pipelines — Convert PDFs to markdown for retrieval-augmented generation with LangChain, LlamaIndex, or any framework
- Document Q&A — Ask Claude questions about PDF content directly
- Data extraction — Pull text and tables from invoices, reports, and forms
- Academic research — Parse papers and extract content for analysis
- Code documentation — Let AI assistants read PDF specs and documentation
Performance
Built on pdf_oxide, which processes PDFs at 0.8ms mean per document with a 100% pass rate on 3,830 test PDFs. The MCP server adds minimal overhead — PDF processing is the same high-performance Rust core used by the library and CLI.
Protocol
Implements MCP protocol version 2024-11-05 with:
initialize— server capability negotiationtools/list— tool discoverytools/call— tool executionping— health check
Documentation
- Full Documentation — Getting started and guides
- MCP Setup Guide — Detailed configuration for each AI assistant
- GitHub — Source code and issue tracker
- Model Context Protocol — MCP specification
Related Crates
pdf_oxide— Rust PDF library (core)pdf_oxide_cli— CLI tool with 22 PDF commands
License
MIT OR Apache-2.0