Crate oxidize_pdf

Source
Expand description

§oxidize-pdf

A comprehensive, pure Rust PDF library for generation, parsing, and manipulation with zero external PDF dependencies.

§Features

  • PDF Generation: Create multi-page documents with text, graphics, and images
  • PDF Parsing: Complete parser supporting rendering and content extraction
  • PDF Operations: Split, merge, rotate, and extract pages
  • Text Extraction: Extract text with position and formatting information
  • Image Extraction: Extract images in JPEG, PNG, and TIFF formats
  • Font Embedding: TrueType and OpenType font embedding with subsetting support (v1.1.6+)
  • Page Analysis: Detect scanned vs text content with intelligent classification
  • OCR Integration: Pluggable OCR support with Tesseract for processing scanned documents (v0.1.3+)
  • Resource Access: Work with fonts, images, and other PDF resources
  • Pure Rust: No C dependencies or external libraries
  • 100% Native: Complete PDF implementation from scratch

§Quick Start

§Creating PDFs

use oxidize_pdf::{Document, Page, Font, Color, Result};

// Create a new document
let mut doc = Document::new();
doc.set_title("My PDF");

// Create a page
let mut page = Page::a4();

// Add text
page.text()
    .set_font(Font::Helvetica, 24.0)
    .at(50.0, 700.0)
    .write("Hello, PDF!")?;

// Add graphics
page.graphics()
    .set_fill_color(Color::rgb(0.0, 0.5, 1.0))
    .circle(300.0, 400.0, 50.0)
    .fill();

// Save the document
doc.add_page(page);
doc.save("output.pdf")?;

§Parsing PDFs

use oxidize_pdf::parser::{PdfDocument, PdfReader};

// Open and parse a PDF
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);

// Get document information
println!("Pages: {}", document.page_count()?);
println!("Version: {}", document.version()?);

// Process pages
for i in 0..document.page_count()? {
    let page = document.get_page(i)?;
    println!("Page {} size: {}x{} points", i+1, page.width(), page.height());
}

// Extract text
let text_pages = document.extract_text()?;
for (i, page_text) in text_pages.iter().enumerate() {
    println!("Page {} text: {}", i+1, page_text.text);
}

§Modules

§Generation Modules

  • document - PDF document creation and management
  • page - Page creation and layout
  • graphics - Vector graphics and images
  • text - Text rendering and flow
  • writer - Low-level PDF writing

§Parsing Modules

§Manipulation Modules

  • operations - PDF manipulation (split, merge, rotate, extract images)
  • operations::page_analysis - Page content analysis and scanned page detection
  • [text::extraction] - Text extraction with positioning

§OCR Modules (v0.1.3+)

  • text::ocr - OCR trait system and types
  • [text::tesseract_provider] - Tesseract OCR provider (requires ocr-tesseract feature)
  • text::ocr - OCR integration for scanned documents

§Examples

§Content Stream Processing

use oxidize_pdf::parser::{PdfDocument, PdfReader};
use oxidize_pdf::parser::content::{ContentParser, ContentOperation};

let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Get and parse content streams
let streams = page.content_streams_with_document(&document)?;
for stream in streams {
    let operations = ContentParser::parse(&stream)?;
     
    for op in operations {
        match op {
            ContentOperation::ShowText(text) => {
                println!("Text: {:?}", String::from_utf8_lossy(&text));
            }
            ContentOperation::SetFont(name, size) => {
                println!("Font: {} at {} pt", name, size);
            }
            ContentOperation::MoveTo(x, y) => {
                println!("Move to ({}, {})", x, y);
            }
            _ => {} // Handle other operations
        }
    }
}

§Resource Access

use oxidize_pdf::parser::{PdfDocument, PdfReader};

let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Access page resources
if let Some(resources) = page.get_resources() {
    // Check fonts
    if let Some(fonts) = resources.get("Font").and_then(|f| f.as_dict()) {
        for (name, _) in &fonts.0 {
            println!("Font resource: {}", name.as_str());
        }
    }
     
    // Check images/XObjects
    if let Some(xobjects) = resources.get("XObject").and_then(|x| x.as_dict()) {
        for (name, _) in &xobjects.0 {
            println!("XObject resource: {}", name.as_str());
        }
    }
}

Re-exports§

pub use document::Document;
pub use document::DocumentMetadata;
pub use error::OxidizePdfError;
pub use error::PdfError;
pub use error::Result;
pub use geometry::Point;
pub use geometry::Rectangle;
pub use graphics::Color;
pub use graphics::GraphicsContext;
pub use graphics::Image;
pub use graphics::ImageColorSpace;
pub use graphics::ImageFormat;
pub use page::Margins;
pub use page::Page;
pub use page_lists::ListStyle;
pub use page_lists::ListType;
pub use page_lists::PageLists;
pub use page_tables::PageTables;
pub use page_tables::TableStyle;
pub use text::measure_text;
pub use text::split_into_words;
pub use text::AdvancedTable;
pub use text::AdvancedTableCell;
pub use text::AdvancedTableOptions;
pub use text::AlternatingRowColors;
pub use text::BorderLine;
pub use text::BorderStyle as TableBorderStyle;
pub use text::BulletStyle;
pub use text::CellContent;
pub use text::CellPadding;
pub use text::ColumnDefinition;
pub use text::ColumnWidth;
pub use text::Font;
pub use text::FontFamily;
pub use text::FragmentType;
pub use text::HeaderStyle;
pub use text::ImagePreprocessing;
pub use text::LineStyle;
pub use text::ListElement;
pub use text::ListOptions;
pub use text::MockOcrProvider;
pub use text::OcrEngine;
pub use text::OcrError;
pub use text::OcrOptions;
pub use text::OcrProcessingResult;
pub use text::OcrProvider;
pub use text::OcrResult;
pub use text::OcrTextFragment;
pub use text::OrderedList;
pub use text::OrderedListStyle;
pub use text::Table;
pub use text::TableCell;
pub use text::TableOptions;
pub use text::TableRow;
pub use text::TextAlign;
pub use text::TextContext;
pub use text::TextFlowContext;
pub use text::UnorderedList;
pub use text::VerticalAlign;
pub use text::fonts::embedding::EmbeddedFontData;
pub use text::fonts::embedding::EmbeddingOptions;
pub use text::fonts::embedding::EncodingDifference;
pub use text::fonts::embedding::FontDescriptor;
pub use text::fonts::embedding::FontEmbedder;
pub use text::fonts::embedding::FontEncoding;
pub use text::fonts::embedding::FontFlags;
pub use text::fonts::embedding::FontMetrics;
pub use text::fonts::embedding::FontType;
pub use parser::ContentOperation;
pub use parser::ContentParser;
pub use parser::DocumentMetadata as ParsedDocumentMetadata;
pub use parser::ParseOptions;
pub use parser::ParsedPage;
pub use parser::PdfArray;
pub use parser::PdfDictionary;
pub use parser::PdfDocument;
pub use parser::PdfName;
pub use parser::PdfObject;
pub use parser::PdfReader;
pub use parser::PdfStream;
pub use parser::PdfString;
pub use operations::merge_pdfs;
pub use operations::rotate_pdf_pages;
pub use operations::split_pdf;
pub use memory::LazyDocument;
pub use memory::MemoryOptions;
pub use memory::StreamProcessor;
pub use memory::StreamingOptions;
pub use streaming::process_in_chunks;
pub use streaming::stream_text;
pub use streaming::ChunkOptions;
pub use streaming::ChunkProcessor;
pub use streaming::ChunkType;
pub use streaming::ContentChunk;
pub use streaming::IncrementalParser;
pub use streaming::ParseEvent;
pub use streaming::StreamingDocument;
pub use streaming::StreamingOptions as StreamOptions;
pub use streaming::StreamingPage;
pub use streaming::TextChunk;
pub use streaming::TextStreamOptions;
pub use streaming::TextStreamer;
pub use batch::batch_merge_pdfs;
pub use batch::batch_process_files;
pub use batch::batch_split_pdfs;
pub use batch::BatchJob;
pub use batch::BatchOptions;
pub use batch::BatchProcessor;
pub use batch::BatchProgress;
pub use batch::BatchResult;
pub use batch::BatchSummary;
pub use batch::JobResult;
pub use batch::JobStatus;
pub use batch::JobType;
pub use batch::ProgressCallback;
pub use batch::ProgressInfo;
pub use recovery::analyze_corruption;
pub use recovery::detect_corruption;
pub use recovery::quick_recover;
pub use recovery::repair_document;
pub use recovery::validate_pdf;
pub use recovery::CorruptionReport;
pub use recovery::CorruptionType;
pub use recovery::ObjectScanner;
pub use recovery::PartialRecovery;
pub use recovery::PdfRecovery;
pub use recovery::RecoveredPage;
pub use recovery::RecoveryOptions;
pub use recovery::RepairResult;
pub use recovery::RepairStrategy;
pub use recovery::ScanResult;
pub use recovery::ValidationError;
pub use recovery::ValidationResult;
pub use structure::Destination;
pub use structure::DestinationType;
pub use structure::NameTree;
pub use structure::NameTreeNode;
pub use structure::NamedDestinations;
pub use structure::OutlineBuilder;
pub use structure::OutlineItem;
pub use structure::OutlineTree;
pub use structure::PageDestination;
pub use structure::PageTree;
pub use structure::PageTreeBuilder;
pub use structure::PageTreeNode;
pub use actions::Action;
pub use actions::ActionDictionary;
pub use actions::ActionType;
pub use actions::GoToAction;
pub use actions::LaunchAction;
pub use actions::LaunchParameters;
pub use actions::NamedAction;
pub use actions::RemoteGoToAction;
pub use actions::StandardNamedAction;
pub use actions::UriAction;
pub use actions::UriActionFlags;
pub use page_labels::PageLabel;
pub use page_labels::PageLabelBuilder;
pub use page_labels::PageLabelRange;
pub use page_labels::PageLabelStyle;
pub use page_labels::PageLabelTree;

Modules§

actions
PDF actions according to ISO 32000-1 Chapter 12.6
annotations
Basic PDF annotations support according to ISO 32000-1 Chapter 12.5
batch
Batch processing for multiple PDF operations
compression
Compression utilities for PDF streams
document
encryption
PDF encryption support according to ISO 32000-1 Chapter 7.6
error
fonts
Font loading and embedding functionality for custom fonts
forms
Basic PDF forms support according to ISO 32000-1 Chapter 12.7
geometry
Basic geometric types for PDF
graphics
memory
Memory optimization module for efficient PDF handling
objects
operations
PDF operations module
page
page_forms
Page-level forms API
page_labels
Page labels for custom page numbering according to ISO 32000-1 Section 12.4.2
page_lists
Page extension for list rendering
page_tables
Page extension for table rendering
parser
PDF Parser Module - Complete PDF parsing and rendering support
pdf_version
Scanned page analysis and OCR example
recovery
PDF error recovery and repair functionality
streaming
Streaming support for incremental PDF processing
structure
Document structure elements including page trees, name trees, and outlines according to ISO 32000-1
text
writer
PDF writing functionality

Constants§

VERSION
Current version of oxidize-pdf