Crate oxidize_pdf

Crate oxidize_pdf 

Source
Expand description

§oxidize-pdf

A comprehensive, pure Rust PDF library for generation, parsing, and manipulation with zero external PDF dependencies.

§Features

  • PDF Generation: Create multi-page documents with text, graphics, and images
  • PDF Parsing: Complete parser supporting rendering and content extraction
  • PDF Operations: Split, merge, rotate, and extract pages
  • Text Extraction: Extract text with position and formatting information
  • Image Extraction: Extract images in JPEG, PNG, and TIFF formats
  • Font Embedding: TrueType and OpenType font embedding with subsetting support (v1.1.6+)
  • Page Analysis: Detect scanned vs text content with intelligent classification
  • OCR Integration: Pluggable OCR support with Tesseract for processing scanned documents (v0.1.3+)
  • Resource Access: Work with fonts, images, and other PDF resources
  • Pure Rust: No C dependencies or external libraries
  • 100% Native: Complete PDF implementation from scratch

§Quick Start

§Creating PDFs

use oxidize_pdf::{Document, Page, Font, Color, Result};

// Create a new document
let mut doc = Document::new();
doc.set_title("My PDF");

// Create a page
let mut page = Page::a4();

// Add text
page.text()
    .set_font(Font::Helvetica, 24.0)
    .at(50.0, 700.0)
    .write("Hello, PDF!")?;

// Add graphics
page.graphics()
    .set_fill_color(Color::rgb(0.0, 0.5, 1.0))
    .circle(300.0, 400.0, 50.0)
    .fill();

// Save the document
doc.add_page(page);
doc.save("output.pdf")?;

§Parsing PDFs

use oxidize_pdf::parser::{PdfDocument, PdfReader};

// Open and parse a PDF
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);

// Get document information
println!("Pages: {}", document.page_count()?);
println!("Version: {}", document.version()?);

// Process pages
for i in 0..document.page_count()? {
    let page = document.get_page(i)?;
    println!("Page {} size: {}x{} points", i+1, page.width(), page.height());
}

// Extract text
let text_pages = document.extract_text()?;
for (i, page_text) in text_pages.iter().enumerate() {
    println!("Page {} text: {}", i+1, page_text.text);
}

§Modules

§Generation Modules

  • document - PDF document creation and management
  • page - Page creation and layout
  • graphics - Vector graphics and images
  • text - Text rendering and flow
  • writer - Low-level PDF writing

§Parsing Modules

§Manipulation Modules

  • operations - PDF manipulation (split, merge, rotate, extract images)
  • operations::page_analysis - Page content analysis and scanned page detection
  • [text::extraction] - Text extraction with positioning

§OCR Modules (v0.1.3+)

  • text::ocr - OCR trait system and types
  • [text::tesseract_provider] - Tesseract OCR provider (requires ocr-tesseract feature)
  • text::ocr - OCR integration for scanned documents

§Examples

§Content Stream Processing

use oxidize_pdf::parser::{PdfDocument, PdfReader};
use oxidize_pdf::parser::content::{ContentParser, ContentOperation};

let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Get and parse content streams
let streams = page.content_streams_with_document(&document)?;
for stream in streams {
    let operations = ContentParser::parse(&stream)?;
     
    for op in operations {
        match op {
            ContentOperation::ShowText(text) => {
                println!("Text: {:?}", String::from_utf8_lossy(&text));
            }
            ContentOperation::SetFont(name, size) => {
                println!("Font: {} at {} pt", name, size);
            }
            ContentOperation::MoveTo(x, y) => {
                println!("Move to ({}, {})", x, y);
            }
            _ => {} // Handle other operations
        }
    }
}

§Resource Access

use oxidize_pdf::parser::{PdfDocument, PdfReader};

let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Access page resources
if let Some(resources) = page.get_resources() {
    // Check fonts
    if let Some(fonts) = resources.get("Font").and_then(|f| f.as_dict()) {
        for (name, _) in &fonts.0 {
            println!("Font resource: {}", name.as_str());
        }
    }
     
    // Check images/XObjects
    if let Some(xobjects) = resources.get("XObject").and_then(|x| x.as_dict()) {
        for (name, _) in &xobjects.0 {
            println!("XObject resource: {}", name.as_str());
        }
    }
}

Re-exports§

pub use coordinate_system::CoordinateSystem;
pub use coordinate_system::RenderContext;
pub use coordinate_system::TransformMatrix;
pub use document::Document;
pub use document::DocumentMetadata;
pub use error::OxidizePdfError;
pub use error::PdfError;
pub use error::Result;
pub use geometry::Point;
pub use geometry::Rectangle;
pub use graphics::Color;
pub use graphics::ColorSpace;
pub use graphics::GraphicsContext;
pub use graphics::Image;
pub use graphics::ImageFormat;
pub use graphics::MaskType;
pub use page::Margins;
pub use page::Page;
pub use page_lists::ListStyle;
pub use page_lists::ListType;
pub use page_lists::PageLists;
pub use page_tables::PageTables;
pub use page_tables::TableStyle;
pub use text::measure_text;
pub use text::split_into_words;
pub use text::BulletStyle;
pub use text::Font;
pub use text::FontFamily;
pub use text::FragmentType;
pub use text::HeaderStyle;
pub use text::ImagePreprocessing;
pub use text::ListElement;
pub use text::ListOptions;
pub use text::MockOcrProvider;
pub use text::OcrEngine;
pub use text::OcrError;
pub use text::OcrOptions;
pub use text::OcrProcessingResult;
pub use text::OcrProvider;
pub use text::OcrResult;
pub use text::OcrTextFragment;
pub use text::OrderedList;
pub use text::OrderedListStyle;
pub use text::Table;
pub use text::TableCell;
pub use text::TableOptions;
pub use text::TextAlign;
pub use text::TextContext;
pub use text::TextFlowContext;
pub use text::UnorderedList;
pub use forms::calculations::FieldValue;
pub use forms::field_actions::ActionSettings;
pub use forms::field_actions::FieldAction;
pub use forms::field_actions::FieldActionSystem;
pub use forms::field_actions::FieldActions;
pub use forms::field_actions::FormatActionType;
pub use forms::field_actions::SpecialFormatType;
pub use forms::field_actions::ValidateActionType;
pub use forms::validation::DateFormat;
pub use forms::validation::FieldValidator;
pub use forms::validation::FormValidationSystem;
pub use forms::validation::FormatMask;
pub use forms::validation::PhoneCountry;
pub use forms::validation::RequiredFieldInfo;
pub use forms::validation::RequirementCondition;
pub use forms::validation::TimeFormat;
pub use forms::validation::ValidationRule;
pub use forms::validation::ValidationSettings;
pub use forms::BorderStyle;
pub use forms::FieldType;
pub use forms::TextField;
pub use forms::Widget;
pub use text::fonts::embedding::EmbeddedFontData;
pub use text::fonts::embedding::EmbeddingOptions;
pub use text::fonts::embedding::EncodingDifference;
pub use text::fonts::embedding::FontDescriptor;
pub use text::fonts::embedding::FontEmbedder;
pub use text::fonts::embedding::FontEncoding;
pub use text::fonts::embedding::FontFlags;
pub use text::fonts::embedding::FontMetrics;
pub use text::fonts::embedding::FontType;
pub use text::font_manager::CustomFont;
pub use text::font_manager::FontManager;
pub use parser::ContentOperation;
pub use parser::ContentParser;
pub use parser::DocumentMetadata as ParsedDocumentMetadata;
pub use parser::ParseOptions;
pub use parser::ParsedPage;
pub use parser::PdfArray;
pub use parser::PdfDictionary;
pub use parser::PdfDocument;
pub use parser::PdfName;
pub use parser::PdfObject;
pub use parser::PdfReader;
pub use parser::PdfStream;
pub use parser::PdfString;
pub use operations::merge_pdfs;
pub use operations::rotate_pdf_pages;
pub use operations::split_pdf;
pub use dashboard::Dashboard;
pub use dashboard::DashboardBuilder;
pub use dashboard::DashboardComponent;
pub use dashboard::DashboardConfig;
pub use dashboard::DashboardLayout;
pub use dashboard::DashboardTheme;
pub use dashboard::HeatMap;
pub use dashboard::KpiCard;
pub use dashboard::PivotTable;
pub use dashboard::ScatterPlot;
pub use dashboard::TreeMap;
pub use dashboard::Typography;
pub use memory::LazyDocument;
pub use memory::MemoryOptions;
pub use memory::StreamProcessor;
pub use memory::StreamingOptions;
pub use streaming::process_in_chunks;
pub use streaming::stream_text;
pub use streaming::ChunkOptions;
pub use streaming::ChunkProcessor;
pub use streaming::ChunkType;
pub use streaming::ContentChunk;
pub use streaming::IncrementalParser;
pub use streaming::ParseEvent;
pub use streaming::StreamingDocument;
pub use streaming::StreamingOptions as StreamOptions;
pub use streaming::StreamingPage;
pub use streaming::TextChunk;
pub use streaming::TextStreamOptions;
pub use streaming::TextStreamer;
pub use batch::batch_merge_pdfs;
pub use batch::batch_process_files;
pub use batch::batch_split_pdfs;
pub use batch::BatchJob;
pub use batch::BatchOptions;
pub use batch::BatchProcessor;
pub use batch::BatchProgress;
pub use batch::BatchResult;
pub use batch::BatchSummary;
pub use batch::JobResult;
pub use batch::JobStatus;
pub use batch::JobType;
pub use batch::ProgressCallback;
pub use batch::ProgressInfo;
pub use recovery::analyze_corruption;
pub use recovery::detect_corruption;
pub use recovery::quick_recover;
pub use recovery::repair_document;
pub use recovery::validate_pdf;
pub use recovery::CorruptionReport;
pub use recovery::CorruptionType;
pub use recovery::ObjectScanner;
pub use recovery::PartialRecovery;
pub use recovery::PdfRecovery;
pub use recovery::RecoveredPage;
pub use recovery::RecoveryOptions;
pub use recovery::RepairResult;
pub use recovery::RepairStrategy;
pub use recovery::ScanResult;
pub use recovery::ValidationError;
pub use recovery::ValidationResult;
pub use structure::Destination;
pub use structure::DestinationType;
pub use structure::NameTree;
pub use structure::NameTreeNode;
pub use structure::NamedDestinations;
pub use structure::OutlineBuilder;
pub use structure::OutlineItem;
pub use structure::OutlineTree;
pub use structure::PageDestination;
pub use structure::PageTree;
pub use structure::PageTreeBuilder;
pub use structure::PageTreeNode;
pub use actions::Action;
pub use actions::ActionDictionary;
pub use actions::ActionType;
pub use actions::GoToAction;
pub use actions::LaunchAction;
pub use actions::LaunchParameters;
pub use actions::NamedAction;
pub use actions::RemoteGoToAction;
pub use actions::StandardNamedAction;
pub use actions::UriAction;
pub use actions::UriActionFlags;
pub use page_labels::PageLabel;
pub use page_labels::PageLabelBuilder;
pub use page_labels::PageLabelRange;
pub use page_labels::PageLabelStyle;
pub use page_labels::PageLabelTree;
pub use templates::Template;
pub use templates::TemplateContext;
pub use templates::TemplateError;
pub use templates::TemplateRenderer;
pub use templates::TemplateResult;
pub use templates::TemplateValue;
pub use verification::comparators::compare_pdfs;
pub use verification::comparators::ComparisonResult;
pub use verification::comparators::DifferenceSeverity;
pub use verification::comparators::PdfDifference;
pub use verification::compliance_report::format_report_markdown;
pub use verification::compliance_report::generate_compliance_report;
pub use verification::compliance_report::ComplianceReport;
pub use verification::iso_matrix::load_default_matrix;
pub use verification::iso_matrix::load_matrix;
pub use verification::iso_matrix::ComplianceStats;
pub use verification::iso_matrix::IsoMatrix;
pub use verification::validators::check_available_validators;
pub use verification::validators::validate_external;
pub use verification::validators::validate_with_qpdf;
pub use verification::extract_pdf_differences;
pub use verification::pdfs_structurally_equivalent;
pub use verification::verify_iso_requirement;
pub use verification::ExternalValidationResult;
pub use verification::IsoRequirement;
pub use verification::VerificationLevel;
pub use verification::VerificationResult;

Modules§

actions
PDF actions according to ISO 32000-1 Chapter 12.6
advanced_tables
Advanced Table System for PDF Generation
annotations
Basic PDF annotations support according to ISO 32000-1 Chapter 12.5
batch
Batch processing for multiple PDF operations
charts
Chart Generation System for PDF Reports
compression
Compression utilities for PDF streams
coordinate_system
Coordinate system management for PDF rendering
dashboard
Dashboard Framework for Professional PDF Reports
document
encryption
PDF encryption support according to ISO 32000-1 Chapter 7.6
error
fonts
Font loading and embedding functionality for custom fonts
forms
Basic PDF forms support according to ISO 32000-1 Chapter 12.7
geometry
Basic geometric types for PDF
graphics
memory
Memory optimization module for efficient PDF handling
objects
operations
PDF operations module
page
page_forms
Page-level forms API
page_labels
Page labels for custom page numbering according to ISO 32000-1 Section 12.4.2
page_lists
Page extension for list rendering
page_tables
Page extension for table rendering
page_transitions
Page transitions for presentations in PDF documents
page_tree
Page Tree Implementation according to ISO 32000-1 Section 7.7.3
parser
PDF Parser Module - Complete PDF parsing and rendering support
pdf_version
Scanned page analysis and OCR example
recovery
PDF error recovery and repair functionality
semantic
Semantic marking for AI-Ready PDFs (Community Edition)
streaming
Streaming support for incremental PDF processing
structure
Document structure elements including page trees, name trees, and outlines according to ISO 32000-1
templates
PDF Template System with Variable Substitution
text
verification
PDF Verification Module
viewer_preferences
Viewer preferences control how the PDF document is displayed in the viewer
writer
PDF writing functionality

Constants§

VERSION
Current version of oxidize-pdf