pdf_oxide_mcp 0.3.30

MCP server for PDF extraction — gives Claude, Cursor, and AI assistants the ability to read PDFs locally. Text, markdown, and HTML output. Powered by pdf_oxide.
pdf_oxide_mcp-0.3.30 is not a library.

pdf-oxide-mcp — PDF Extraction MCP Server for AI Assistants

An MCP (Model Context Protocol) server that gives AI assistants like Claude, Cursor, and GitHub Copilot the ability to extract text, markdown, and HTML from PDF files. Powered by pdf_oxide, the fastest Rust PDF library. All processing runs locally — no files leave your machine.

Crates.io License: MIT OR Apache-2.0

Install

brew install yfedoseev/tap/pdf-oxide    # Homebrew (macOS/Linux) — includes both CLI and MCP
cargo install pdf_oxide_mcp             # Cargo

Configure Your AI Assistant

Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "pdf-oxide": {
      "command": "pdf-oxide-mcp"
    }
  }
}

Claude Code

Add to your project's .mcp.json or global settings:

{
  "mcpServers": {
    "pdf-oxide": {
      "command": "pdf-oxide-mcp"
    }
  }
}

Cursor

Add to Cursor's MCP configuration:

{
  "mcpServers": {
    "pdf-oxide": {
      "command": "pdf-oxide-mcp"
    }
  }
}

npx (no install required)

{
  "mcpServers": {
    "pdf-oxide": {
      "command": "crgx",
      "args": ["pdf_oxide_mcp@latest"]
    }
  }
}

Tools

The server exposes an extract tool with the following parameters:

Parameter Type Required Description
file_path string yes Path to the PDF file
output_path string yes Path to write extracted content
format string no text (default), markdown, or html
pages string no Page range, e.g. "1-3,7,10-12"
password string no Password for encrypted PDFs
images boolean no Extract images to files alongside output
embed_images boolean no Embed images as base64 data URIs (default: true)

How It Works

pdf-oxide-mcp implements the Model Context Protocol over stdin/stdout using JSON-RPC. When an AI assistant needs to read a PDF, it calls the extract tool with the file path and desired format. The server processes the PDF locally using the pdf_oxide library and returns the extracted content.

  • Text — plain text extraction preserving reading order
  • Markdown — structured output with headings, lists, and column-aware layout
  • HTML — formatted HTML output
  • Images — optional image extraction as separate files or embedded base64

Use Cases

  • RAG pipelines — Convert PDFs to markdown for retrieval-augmented generation with LangChain, LlamaIndex, or any framework
  • Document Q&A — Ask Claude questions about PDF content directly
  • Data extraction — Pull text and tables from invoices, reports, and forms
  • Academic research — Parse papers and extract content for analysis
  • Code documentation — Let AI assistants read PDF specs and documentation

Performance

Built on pdf_oxide, which processes PDFs at 0.8ms mean per document with a 100% pass rate on 3,830 test PDFs. The MCP server adds minimal overhead — PDF processing is the same high-performance Rust core used by the library and CLI.

Protocol

Implements MCP protocol version 2024-11-05 with:

  • initialize — server capability negotiation
  • tools/list — tool discovery
  • tools/call — tool execution
  • ping — health check

Documentation

Related Crates

License

MIT OR Apache-2.0