oxidize-pdf
A pure Rust PDF generation and manipulation library with zero external PDF dependencies. Generate professional PDFs, parse existing documents, and perform operations like split, merge, and rotate with a clean, safe API.
Features
- 🚀 100% Pure Rust - No C dependencies or external PDF libraries
- 📄 PDF Generation - Create multi-page documents with text, graphics, and images
- 🔍 PDF Parsing - Read and extract content from existing PDFs (97.8% success rate on real-world PDFs)
- ✂️ PDF Operations - Split, merge, and rotate PDFs while preserving content
- 🖼️ Image Support - Embed JPEG images with automatic compression
- 🎨 Rich Graphics - Vector graphics with shapes, paths, colors (RGB/CMYK/Gray)
- 📝 Advanced Text - Multiple fonts, text flow with automatic wrapping, alignment
- 🔍 OCR Support - Extract text from scanned PDFs using Tesseract OCR (v0.1.3+)
- 🗜️ Compression - Built-in FlateDecode compression for smaller files
- 🔒 Type Safe - Leverage Rust's type system for safe PDF manipulation
Quick Start
Add oxidize-pdf to your Cargo.toml:
[]
= "0.1.3"
# For OCR support (optional)
= { = "0.1.3", = ["ocr-tesseract"] }
Basic PDF Generation
use ;
Parse Existing PDF
use ;
Working with Images
use ;
Advanced Text Flow
use ;
PDF Operations
use ;
use Result;
OCR Text Extraction
use ;
use ;
use PageContentAnalyzer;
use PdfReader;
use Result;
OCR Installation
Before using OCR features, install Tesseract on your system:
macOS:
Ubuntu/Debian:
Windows: Download from: https://github.com/UB-Mannheim/tesseract/wiki
Supported Features
PDF Generation
- ✅ Multi-page documents
- ✅ Vector graphics (rectangles, circles, paths, lines)
- ✅ Text rendering with standard fonts (Helvetica, Times, Courier)
- ✅ JPEG image embedding
- ✅ RGB, CMYK, and Grayscale colors
- ✅ Graphics transformations (translate, rotate, scale)
- ✅ Text flow with automatic line wrapping
- ✅ FlateDecode compression
PDF Parsing
- ✅ PDF 1.0 - 1.7 support
- ✅ Cross-reference table parsing
- ✅ Object and stream parsing
- ✅ Page tree navigation
- ✅ Content stream parsing
- ✅ Text extraction
- ✅ Document metadata extraction
- ✅ Basic filter support (FlateDecode, ASCIIHexDecode, ASCII85Decode)
PDF Operations
- ✅ Split by pages, ranges, or size
- ✅ Merge multiple PDFs
- ✅ Rotate pages (90°, 180°, 270°)
- ✅ Basic content preservation
OCR Support (v0.1.3+)
- ✅ Tesseract OCR integration with feature flag
- ✅ Multi-language support (50+ languages)
- ✅ Page analysis and scanned page detection
- ✅ Configurable preprocessing (denoise, deskew, contrast)
- ✅ Layout preservation with position information
- ✅ Confidence scoring and filtering
- ✅ Multiple page segmentation modes (PSM)
- ✅ Character whitelisting/blacklisting
- ✅ Mock OCR provider for testing
- ✅ Parallel and batch processing
Performance
- Parsing: < 50ms for typical PDFs
- Generation: < 20ms for 10-page documents
- Memory efficient: Streaming operations for large files
- Zero-copy: Where possible for optimal performance
Examples
Check out the examples directory for more usage patterns:
hello_world.rs- Basic PDF creationgraphics_demo.rs- Vector graphics showcasetext_formatting.rs- Advanced text featuresjpeg_image.rs- Image embeddingparse_pdf.rs- PDF parsing and text extractioncomprehensive_demo.rs- All features demonstrationtesseract_ocr_demo.rs- OCR text extraction (requires--features ocr-tesseract)scanned_pdf_analysis.rs- Analyze PDFs for scanned contentextract_images.rs- Extract embedded images from PDFscreate_pdf_with_images.rs- Advanced image embedding examples
Run examples with:
# For OCR examples
License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Commercial Licensing
For commercial use cases that require proprietary licensing, please contact us about our PRO and Enterprise editions which offer:
- Commercial-friendly licensing
- Advanced OCR features (cloud providers, batch processing)
- PDF forms and digital signatures
- Priority support and SLAs
- Custom feature development
Testing
oxidize-pdf includes comprehensive test suites to ensure reliability:
# Run standard test suite (synthetic PDFs)
# Run all tests including performance benchmarks
# Run with local PDF fixtures (if available)
OXIDIZE_PDF_FIXTURES=on
# Run OCR tests (requires Tesseract installation)
Local PDF Fixtures (Optional)
For enhanced testing with real-world PDFs, you can optionally set up local PDF fixtures:
- Create a symbolic link:
tests/fixtures -> /path/to/your/pdf/collection - The test suite will automatically detect and use these PDFs
- Fixtures are never committed to the repository (excluded in
.gitignore) - Tests work fine without fixtures using synthetic PDFs
Note: CI/CD always uses synthetic PDFs only for consistent, fast builds.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Roadmap
Community Edition (Open Source)
- Basic transparency/opacity support (Q3 2025)
- PNG image support
- XRef stream support (PDF 1.5+)
- TrueType/OpenType font embedding
- Improved text extraction with CMap/ToUnicode
PRO/Enterprise Features
- Advanced transparency (blend modes, groups)
- Cloud OCR providers (Azure, AWS, Google Cloud)
- OCR batch processing and parallel execution
- PDF forms and annotations
- Digital signatures
- PDF/A compliance
- Encryption support
See our detailed roadmap for more information.
Support
Acknowledgments
Built with ❤️ using Rust. Special thanks to the Rust community and all contributors.