oxidize-pdf

A pure Rust PDF generation and manipulation library with zero external PDF dependencies. Generate professional PDFs, parse existing documents, and perform operations like split, merge, and rotate with a clean, safe API.

Features

🚀 100% Pure Rust - No C dependencies or external PDF libraries
📄 PDF Generation - Create multi-page documents with text, graphics, and images
🔍 PDF Parsing - Read and extract content from existing PDFs (97.8% success rate on real-world PDFs)
✂️ PDF Operations - Split, merge, and rotate PDFs while preserving content
🖼️ Image Support - Embed JPEG images with automatic compression
🎨 Rich Graphics - Vector graphics with shapes, paths, colors (RGB/CMYK/Gray)
📝 Advanced Text - Multiple fonts, text flow with automatic wrapping, alignment
🔍 OCR Support - Extract text from scanned PDFs using Tesseract OCR (v0.1.3+)
🗜️ Compression - Built-in FlateDecode compression for smaller files
🔒 Type Safe - Leverage Rust's type system for safe PDF manipulation

Quick Start

Add oxidize-pdf to your Cargo.toml:

[dependencies]
oxidize-pdf = "0.1.3"

# For OCR support (optional)
oxidize-pdf = { version = "0.1.3", features = ["ocr-tesseract"] }

Basic PDF Generation

use oxidize_pdf::{Document, Page, Font, Color, Result};

fn main() -> Result<()> {
    // Create a new document
    let mut doc = Document::new();
    doc.set_title("My First PDF");
    doc.set_author("Rust Developer");
    
    // Create a page
    let mut page = Page::a4();
    
    // Add text
    page.text()
        .set_font(Font::Helvetica, 24.0)
        .at(50.0, 700.0)
        .write("Hello, PDF!")?;
    
    // Add graphics
    page.graphics()
        .set_fill_color(Color::rgb(0.0, 0.5, 1.0))
        .circle(300.0, 400.0, 50.0)
        .fill();
    
    // Add the page and save
    doc.add_page(page);
    doc.save("hello.pdf")?;
    
    Ok(())
}

Parse Existing PDF

use oxidize_pdf::{PdfReader, Result};

fn main() -> Result<()> {
    // Open and parse a PDF
    let mut reader = PdfReader::open("document.pdf")?;
    
    // Get document info
    println!("PDF Version: {}", reader.version());
    println!("Page Count: {}", reader.page_count()?);
    
    // Extract text from all pages
    let document = reader.into_document();
    let text = document.extract_text()?;
    
    for (page_num, page_text) in text.iter().enumerate() {
        println!("Page {}: {}", page_num + 1, page_text.content);
    }
    
    Ok(())
}

Working with Images

use oxidize_pdf::{Document, Page, Image, Result};

fn main() -> Result<()> {
    let mut doc = Document::new();
    let mut page = Page::a4();
    
    // Load a JPEG image
    let image = Image::from_jpeg_file("photo.jpg")?;
    
    // Add image to page
    page.add_image("my_photo", image);
    
    // Draw the image
    page.draw_image("my_photo", 100.0, 300.0, 400.0, 300.0)?;
    
    doc.add_page(page);
    doc.save("image_example.pdf")?;
    
    Ok(())
}

Advanced Text Flow

use oxidize_pdf::{Document, Page, Font, TextAlign, Result};

fn main() -> Result<()> {
    let mut doc = Document::new();
    let mut page = Page::a4();
    
    // Create text flow with automatic wrapping
    let mut flow = page.text_flow();
    flow.at(50.0, 700.0)
        .set_font(Font::Times, 12.0)
        .set_alignment(TextAlign::Justified)
        .write_wrapped("This is a long paragraph that will automatically wrap \
                       to fit within the page margins. The text is justified, \
                       creating clean edges on both sides.")?;
    
    page.add_text_flow(&flow);
    doc.add_page(page);
    doc.save("text_flow.pdf")?;
    
    Ok(())
}

PDF Operations

use oxidize_pdf::operations::{PdfSplitter, PdfMerger, PageRange};
use oxidize_pdf::Result;

fn main() -> Result<()> {
    // Split a PDF
    let splitter = PdfSplitter::new("input.pdf")?;
    splitter.split_by_pages("page_{}.pdf")?; // page_1.pdf, page_2.pdf, ...
    
    // Merge PDFs
    let mut merger = PdfMerger::new();
    merger.add_pdf("doc1.pdf", PageRange::All)?;
    merger.add_pdf("doc2.pdf", PageRange::Pages(vec![1, 3, 5]))?;
    merger.save("merged.pdf")?;
    
    // Rotate pages
    use oxidize_pdf::operations::{PdfRotator, RotationAngle};
    let rotator = PdfRotator::new("input.pdf")?;
    rotator.rotate_all(RotationAngle::Clockwise90, "rotated.pdf")?;
    
    Ok(())
}

OCR Text Extraction

use oxidize_pdf::text::tesseract_provider::{TesseractOcrProvider, TesseractConfig};
use oxidize_pdf::text::ocr::{OcrOptions, OcrProvider};
use oxidize_pdf::operations::page_analysis::PageContentAnalyzer;
use oxidize_pdf::parser::PdfReader;
use oxidize_pdf::Result;

fn main() -> Result<()> {
    // Open a scanned PDF
    let document = PdfReader::open_document("scanned.pdf")?;
    let analyzer = PageContentAnalyzer::new(document);
    
    // Configure OCR provider
    let config = TesseractConfig::for_documents();
    let ocr_provider = TesseractOcrProvider::with_config(config)?;
    
    // Find and process scanned pages
    let scanned_pages = analyzer.find_scanned_pages()?;
    
    for page_num in scanned_pages {
        let result = analyzer.extract_text_from_scanned_page(page_num, &ocr_provider)?;
        println!("Page {}: {} (confidence: {:.1}%)", 
                 page_num, result.text, result.confidence * 100.0);
    }
    
    Ok(())
}

OCR Installation

Before using OCR features, install Tesseract on your system:

macOS:

brew install tesseract
brew install tesseract-lang  # For additional languages

Ubuntu/Debian:

sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-spa  # For Spanish
sudo apt-get install tesseract-ocr-deu  # For German

Windows: Download from: https://github.com/UB-Mannheim/tesseract/wiki

Supported Features

PDF Generation

✅ Multi-page documents
✅ Vector graphics (rectangles, circles, paths, lines)
✅ Text rendering with standard fonts (Helvetica, Times, Courier)
✅ JPEG image embedding
✅ RGB, CMYK, and Grayscale colors
✅ Graphics transformations (translate, rotate, scale)
✅ Text flow with automatic line wrapping
✅ FlateDecode compression

PDF Parsing

✅ PDF 1.0 - 1.7 support
✅ Cross-reference table parsing
✅ Object and stream parsing
✅ Page tree navigation
✅ Content stream parsing
✅ Text extraction
✅ Document metadata extraction
✅ Basic filter support (FlateDecode, ASCIIHexDecode, ASCII85Decode)

PDF Operations

✅ Split by pages, ranges, or size
✅ Merge multiple PDFs
✅ Rotate pages (90°, 180°, 270°)
✅ Basic content preservation

OCR Support (v0.1.3+)

✅ Tesseract OCR integration with feature flag
✅ Multi-language support (50+ languages)
✅ Page analysis and scanned page detection
✅ Configurable preprocessing (denoise, deskew, contrast)
✅ Layout preservation with position information
✅ Confidence scoring and filtering
✅ Multiple page segmentation modes (PSM)
✅ Character whitelisting/blacklisting
✅ Mock OCR provider for testing
✅ Parallel and batch processing

Performance

Parsing: < 50ms for typical PDFs
Generation: < 20ms for 10-page documents
Memory efficient: Streaming operations for large files
Zero-copy: Where possible for optimal performance

Examples

Check out the examples directory for more usage patterns:

hello_world.rs - Basic PDF creation
graphics_demo.rs - Vector graphics showcase
text_formatting.rs - Advanced text features
jpeg_image.rs - Image embedding
parse_pdf.rs - PDF parsing and text extraction
comprehensive_demo.rs - All features demonstration
tesseract_ocr_demo.rs - OCR text extraction (requires --features ocr-tesseract)
scanned_pdf_analysis.rs - Analyze PDFs for scanned content
extract_images.rs - Extract embedded images from PDFs
create_pdf_with_images.rs - Advanced image embedding examples

Run examples with:

cargo run --example hello_world

# For OCR examples
cargo run --example tesseract_ocr_demo --features ocr-tesseract

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Commercial Licensing

For commercial use cases that require proprietary licensing, please contact us about our PRO and Enterprise editions which offer:

Commercial-friendly licensing
Advanced OCR features (cloud providers, batch processing)
PDF forms and digital signatures
Priority support and SLAs
Custom feature development

Testing

oxidize-pdf includes comprehensive test suites to ensure reliability:

# Run standard test suite (synthetic PDFs)
cargo test

# Run all tests including performance benchmarks
cargo test -- --ignored

# Run with local PDF fixtures (if available)
OXIDIZE_PDF_FIXTURES=on cargo test

# Run OCR tests (requires Tesseract installation)
cargo test tesseract_ocr_tests --features ocr-tesseract -- --ignored

Local PDF Fixtures (Optional)

For enhanced testing with real-world PDFs, you can optionally set up local PDF fixtures:

Create a symbolic link: tests/fixtures -> /path/to/your/pdf/collection
The test suite will automatically detect and use these PDFs
Fixtures are never committed to the repository (excluded in .gitignore)
Tests work fine without fixtures using synthetic PDFs

Note: CI/CD always uses synthetic PDFs only for consistent, fast builds.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Roadmap

Community Edition (Open Source)

Basic transparency/opacity support (Q3 2025)
PNG image support
XRef stream support (PDF 1.5+)
TrueType/OpenType font embedding
Improved text extraction with CMap/ToUnicode

PRO/Enterprise Features

Advanced transparency (blend modes, groups)
Cloud OCR providers (Azure, AWS, Google Cloud)
OCR batch processing and parallel execution
PDF forms and annotations
Digital signatures
PDF/A compliance
Encryption support

See our detailed roadmap for more information.

Support

Acknowledgments

Built with ❤️ using Rust. Special thanks to the Rust community and all contributors.

oxidize-pdf 0.1.4

oxidize-pdf

Features

Quick Start

Basic PDF Generation

Parse Existing PDF

Working with Images

Advanced Text Flow

PDF Operations

OCR Text Extraction

OCR Installation

Supported Features

PDF Generation

PDF Parsing

PDF Operations

OCR Support (v0.1.3+)

Performance

Examples

License

Commercial Licensing

Testing

Local PDF Fixtures (Optional)

Contributing

Roadmap

Community Edition (Open Source)

PRO/Enterprise Features

Support

Acknowledgments