pdf-oxide — The Fastest PDF CLI Toolkit
A command-line tool for PDF text extraction, markdown conversion, search, merge, split, image extraction, and more. Built on pdf_oxide, the fastest Rust PDF library (0.8ms mean, 100% pass rate on 3,830 PDFs). MIT licensed.
Install
Quick Start
All Commands
| Command | Description |
|---|---|
text |
Extract text from PDF pages |
markdown |
Convert PDF to Markdown with headings, lists, and layout |
html |
Convert PDF to HTML |
search |
Search PDF content with regex patterns |
images |
Extract images to files (PNG, JPEG, etc.) |
info |
Show PDF metadata, page count, and version |
metadata |
Read and write PDF metadata fields |
merge |
Combine multiple PDFs into one |
split |
Split PDF into individual pages |
compress |
Reduce PDF file size |
encrypt |
Password-protect a PDF |
decrypt |
Remove password from a PDF |
rotate |
Rotate pages by 90, 180, or 270 degrees |
crop |
Set page crop box dimensions |
delete |
Remove specific pages |
reorder |
Rearrange page order |
watermark |
Add text watermark to pages |
flatten |
Flatten form fields and annotations |
forms |
Read and fill PDF form fields |
bookmarks |
Extract document bookmarks/outline |
create |
Create new PDF documents programmatically |
Features
- 22 commands for complete PDF processing from the terminal
- Fast — powered by pdf_oxide, 5x faster than PyMuPDF
- PDF to Markdown — headings, bullet lists, column-aware reading order
- Regex search — full regex pattern matching across pages
- Image extraction — extracts images from content streams, form XObjects, and inline images
- Form filling — read and write PDF form fields from the command line
- Page range support — use
--pages 1-5,10on any command - JSON output — add
--jsonfor machine-readable results - Interactive REPL — run
pdf-oxidewith no arguments for interactive mode - Encrypted PDFs — supply
--passwordto open protected files - Cross-platform — Linux, macOS, and Windows
Usage Examples
Extract text from specific pages
Convert to Markdown for LLM/RAG pipelines
Search across a PDF
Merge and split
Work with forms
Extract images
Performance
pdf_oxide processes PDFs at 0.8ms mean per document — 5x faster than PyMuPDF, 15x faster than pypdf. Text extraction, markdown conversion, and all operations share the same high-performance Rust core.
Documentation
- Full Documentation — Getting started, CLI guide, and API reference
- CLI Guide — Detailed command reference
- GitHub — Source code and issue tracker
Related Crates
pdf_oxide— Rust PDF library (core)pdf_oxide_mcp— MCP server for AI assistants
License
MIT OR Apache-2.0