Expand description
§oxidize-pdf
A comprehensive, pure Rust PDF library for generation, parsing, and manipulation with zero external PDF dependencies.
§Features
- PDF Generation: Create multi-page documents with text, graphics, and images
- PDF Parsing: Complete parser supporting rendering and content extraction
- PDF Operations: Split, merge, rotate, and extract pages
- Text Extraction: Extract text with position and formatting information
- Image Extraction: Extract images in JPEG, PNG, and TIFF formats
- Font Embedding: TrueType and OpenType font embedding with subsetting support (v1.1.6+)
- Page Analysis: Detect scanned vs text content with intelligent classification
- OCR Integration: Pluggable OCR support with Tesseract for processing scanned documents (v0.1.3+)
- Resource Access: Work with fonts, images, and other PDF resources
- Pure Rust: No C dependencies or external libraries
- 100% Native: Complete PDF implementation from scratch
§Quick Start
§Creating PDFs
use oxidize_pdf::{Document, Page, Font, Color, Result};
// Create a new document
let mut doc = Document::new();
doc.set_title("My PDF");
// Create a page
let mut page = Page::a4();
// Add text
page.text()
.set_font(Font::Helvetica, 24.0)
.at(50.0, 700.0)
.write("Hello, PDF!")?;
// Add graphics
page.graphics()
.set_fill_color(Color::rgb(0.0, 0.5, 1.0))
.circle(300.0, 400.0, 50.0)
.fill();
// Save the document
doc.add_page(page);
doc.save("output.pdf")?;§Parsing PDFs
use oxidize_pdf::parser::{PdfDocument, PdfReader};
// Open and parse a PDF
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
// Get document information
println!("Pages: {}", document.page_count()?);
println!("Version: {}", document.version()?);
// Process pages
for i in 0..document.page_count()? {
let page = document.get_page(i)?;
println!("Page {} size: {}x{} points", i+1, page.width(), page.height());
}
// Extract text
let text_pages = document.extract_text()?;
for (i, page_text) in text_pages.iter().enumerate() {
println!("Page {} text: {}", i+1, page_text.text);
}§Modules
§Generation Modules
document- PDF document creation and managementpage- Page creation and layoutgraphics- Vector graphics and imagestext- Text rendering and flowwriter- Low-level PDF writing
§Parsing Modules
parser- Complete PDF parsing and readingparser::PdfDocument- High-level document interfaceparser::ParsedPage- Page representation with resourcesparser::ContentParser- Content stream parsingparser::PdfObject- Low-level PDF objects
§Manipulation Modules
operations- PDF manipulation (split, merge, rotate, extract images)operations::page_analysis- Page content analysis and scanned page detectiontext::extraction- Text extraction with positioning
§OCR Modules (v0.1.3+)
text::ocr- OCR trait system and typestext::tesseract_provider- Tesseract OCR provider (requiresocr-tesseractfeature)text::ocr- OCR integration for scanned documents
§Examples
§Content Stream Processing
use oxidize_pdf::parser::{PdfDocument, PdfReader};
use oxidize_pdf::parser::content::{ContentParser, ContentOperation};
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;
// Get and parse content streams
let streams = page.content_streams_with_document(&document)?;
for stream in streams {
let operations = ContentParser::parse(&stream)?;
for op in operations {
match op {
ContentOperation::ShowText(text) => {
println!("Text: {:?}", String::from_utf8_lossy(&text));
}
ContentOperation::SetFont(name, size) => {
println!("Font: {} at {} pt", name, size);
}
ContentOperation::MoveTo(x, y) => {
println!("Move to ({}, {})", x, y);
}
_ => {} // Handle other operations
}
}
}§Resource Access
use oxidize_pdf::parser::{PdfDocument, PdfReader};
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;
// Access page resources
if let Some(resources) = page.get_resources() {
// Check fonts
if let Some(fonts) = resources.get("Font").and_then(|f| f.as_dict()) {
for (name, _) in &fonts.0 {
println!("Font resource: {}", name.as_str());
}
}
// Check images/XObjects
if let Some(xobjects) = resources.get("XObject").and_then(|x| x.as_dict()) {
for (name, _) in &xobjects.0 {
println!("XObject resource: {}", name.as_str());
}
}
}Re-exports§
pub use coordinate_system::CoordinateSystem;pub use coordinate_system::RenderContext;pub use coordinate_system::TransformMatrix;pub use document::Document;pub use document::DocumentMetadata;pub use error::OxidizePdfError;pub use error::PdfError;pub use error::Result;pub use geometry::Point;pub use geometry::Rectangle;pub use graphics::Color;pub use graphics::ColorSpace;pub use graphics::GraphicsContext;pub use graphics::Image;pub use graphics::ImageFormat;pub use graphics::MaskType;pub use page::Margins;pub use page::Page;pub use page_lists::ListStyle;pub use page_lists::ListType;pub use page_lists::PageLists;pub use page_tables::PageTables;pub use page_tables::TableStyle;pub use text::measure_text;pub use text::split_into_words;pub use text::BulletStyle;pub use text::Font;pub use text::FontFamily;pub use text::FragmentType;pub use text::HeaderStyle;pub use text::ImagePreprocessing;pub use text::ListElement;pub use text::ListOptions;pub use text::MockOcrProvider;pub use text::OcrEngine;pub use text::OcrError;pub use text::OcrOptions;pub use text::OcrProcessingResult;pub use text::OcrProvider;pub use text::OcrResult;pub use text::OcrTextFragment;pub use text::OrderedList;pub use text::OrderedListStyle;pub use text::Table;pub use text::TableCell;pub use text::TableOptions;pub use text::TextAlign;pub use text::TextContext;pub use text::TextFlowContext;pub use text::UnorderedList;pub use forms::calculations::FieldValue;pub use forms::field_actions::ActionSettings;pub use forms::field_actions::FieldAction;pub use forms::field_actions::FieldActionSystem;pub use forms::field_actions::FieldActions;pub use forms::field_actions::FormatActionType;pub use forms::field_actions::SpecialFormatType;pub use forms::field_actions::ValidateActionType;pub use forms::validation::DateFormat;pub use forms::validation::FieldValidator;pub use forms::validation::FormValidationSystem;pub use forms::validation::FormatMask;pub use forms::validation::PhoneCountry;pub use forms::validation::RequiredFieldInfo;pub use forms::validation::RequirementCondition;pub use forms::validation::TimeFormat;pub use forms::validation::ValidationRule;pub use forms::validation::ValidationSettings;pub use forms::BorderStyle;pub use forms::FieldType;pub use forms::TextField;pub use forms::Widget;pub use text::fonts::embedding::EmbeddedFontData;pub use text::fonts::embedding::EmbeddingOptions;pub use text::fonts::embedding::EncodingDifference;pub use text::fonts::embedding::FontDescriptor;pub use text::fonts::embedding::FontEmbedder;pub use text::fonts::embedding::FontEncoding;pub use text::fonts::embedding::FontFlags;pub use text::fonts::embedding::FontMetrics;pub use text::fonts::embedding::FontType;pub use text::font_manager::CustomFont;pub use text::font_manager::FontManager;pub use parser::ContentOperation;pub use parser::ContentParser;pub use parser::DocumentMetadata as ParsedDocumentMetadata;pub use parser::ParseOptions;pub use parser::ParsedPage;pub use parser::PdfArray;pub use parser::PdfDictionary;pub use parser::PdfDocument;pub use parser::PdfName;pub use parser::PdfObject;pub use parser::PdfReader;pub use parser::PdfStream;pub use parser::PdfString;pub use operations::extract_images_from_pages;pub use operations::extract_images_from_pdf;pub use operations::merge_pdfs;pub use operations::rotate_pdf_pages;pub use operations::split_pdf;pub use operations::ExtractImagesOptions;pub use operations::ExtractedImage;pub use operations::ImageExtractor;pub use dashboard::Dashboard;pub use dashboard::DashboardBuilder;pub use dashboard::DashboardComponent;pub use dashboard::DashboardConfig;pub use dashboard::DashboardLayout;pub use dashboard::DashboardTheme;pub use dashboard::HeatMap;pub use dashboard::KpiCard;pub use dashboard::PivotTable;pub use dashboard::ScatterPlot;pub use dashboard::TreeMap;pub use dashboard::Typography;pub use memory::LazyDocument;pub use memory::MemoryOptions;pub use memory::StreamProcessor;pub use memory::StreamingOptions;pub use streaming::process_in_chunks;pub use streaming::stream_text;pub use streaming::ChunkOptions;pub use streaming::ChunkProcessor;pub use streaming::ChunkType;pub use streaming::ContentChunk;pub use streaming::IncrementalParser;pub use streaming::ParseEvent;pub use streaming::StreamingDocument;pub use streaming::StreamingOptions as StreamOptions;pub use streaming::StreamingPage;pub use streaming::TextChunk;pub use streaming::TextStreamOptions;pub use streaming::TextStreamer;pub use batch::batch_merge_pdfs;pub use batch::batch_process_files;pub use batch::batch_split_pdfs;pub use batch::BatchJob;pub use batch::BatchOptions;pub use batch::BatchProcessor;pub use batch::BatchProgress;pub use batch::BatchResult;pub use batch::BatchSummary;pub use batch::JobResult;pub use batch::JobStatus;pub use batch::JobType;pub use batch::ProgressCallback;pub use batch::ProgressInfo;pub use recovery::analyze_corruption;pub use recovery::detect_corruption;pub use recovery::quick_recover;pub use recovery::repair_document;pub use recovery::validate_pdf;pub use recovery::CorruptionReport;pub use recovery::CorruptionType;pub use recovery::ObjectScanner;pub use recovery::PartialRecovery;pub use recovery::PdfRecovery;pub use recovery::RecoveredPage;pub use recovery::RecoveryOptions;pub use recovery::RepairResult;pub use recovery::RepairStrategy;pub use recovery::ScanResult;pub use recovery::ValidationError;pub use recovery::ValidationResult;pub use structure::Destination;pub use structure::DestinationType;pub use structure::NameTree;pub use structure::NameTreeNode;pub use structure::NamedDestinations;pub use structure::OutlineBuilder;pub use structure::OutlineItem;pub use structure::OutlineTree;pub use structure::PageDestination;pub use structure::PageTree;pub use structure::PageTreeBuilder;pub use structure::PageTreeNode;pub use actions::Action;pub use actions::ActionDictionary;pub use actions::ActionType;pub use actions::GoToAction;pub use actions::LaunchAction;pub use actions::LaunchParameters;pub use actions::NamedAction;pub use actions::RemoteGoToAction;pub use actions::StandardNamedAction;pub use actions::UriAction;pub use actions::UriActionFlags;pub use page_labels::PageLabel;pub use page_labels::PageLabelBuilder;pub use page_labels::PageLabelRange;pub use page_labels::PageLabelStyle;pub use page_labels::PageLabelTree;pub use templates::Template;pub use templates::TemplateContext;pub use templates::TemplateError;pub use templates::TemplateRenderer;pub use templates::TemplateResult;pub use templates::TemplateValue;pub use semantic::BoundingBox;pub use semantic::Entity;pub use semantic::EntityMap;pub use semantic::EntityMetadata;pub use semantic::EntityRelation;pub use semantic::EntityType;pub use semantic::ExportFormat;pub use semantic::RelationType;pub use semantic::SemanticEntity;pub use semantic::SemanticMarking;pub use verification::comparators::compare_pdfs;pub use verification::comparators::ComparisonResult;pub use verification::comparators::DifferenceSeverity;pub use verification::comparators::PdfDifference;pub use verification::compliance_report::format_report_markdown;pub use verification::compliance_report::generate_compliance_report;pub use verification::compliance_report::ComplianceReport;pub use verification::iso_matrix::load_default_matrix;pub use verification::iso_matrix::load_matrix;pub use verification::iso_matrix::ComplianceStats;pub use verification::iso_matrix::IsoMatrix;pub use verification::validators::check_available_validators;pub use verification::validators::validate_external;pub use verification::validators::validate_with_qpdf;pub use verification::extract_pdf_differences;pub use verification::pdfs_structurally_equivalent;pub use verification::verify_iso_requirement;pub use verification::ExternalValidationResult;pub use verification::IsoRequirement;pub use verification::VerificationLevel;pub use verification::VerificationResult;
Modules§
- actions
- PDF actions according to ISO 32000-1 Chapter 12.6
- advanced_
tables - Advanced Table System for PDF Generation
- ai
- AI/ML integration utilities for PDF processing
- annotations
- Basic PDF annotations support according to ISO 32000-1 Chapter 12.5
- batch
- Batch processing for multiple PDF operations
- charts
- Chart Generation System for PDF Reports
- compression
- Compression utilities for PDF streams
- coordinate_
system - Coordinate system management for PDF rendering
- dashboard
- Dashboard Framework for Professional PDF Reports
- document
- encryption
- PDF encryption support according to ISO 32000-1 Chapter 7.6
- error
- fonts
- Font loading and embedding functionality for custom fonts
- forms
- Basic PDF forms support according to ISO 32000-1 Chapter 12.7
- geometry
- Basic geometric types for PDF
- graphics
- memory
- Memory optimization module for efficient PDF handling
- metadata
- objects
- PDF Object Types (Writer Module)
- operations
- PDF operations module
- page
- page_
forms - Page-level forms API
- page_
labels - Page labels for custom page numbering according to ISO 32000-1 Section 12.4.2
- page_
lists - Page extension for list rendering
- page_
tables - Page extension for table rendering
- page_
transitions - Page transitions for presentations in PDF documents
- page_
tree - Page Tree Implementation according to ISO 32000-1 Section 7.7.3
- parser
- PDF Parser Module - Complete PDF parsing and rendering support
- pdf_
objects - Unified PDF Object Types
- pdf_
version - Scanned page analysis and OCR example
- recovery
- PDF error recovery and repair functionality
- semantic
- Semantic marking for AI-Ready PDFs (Community Edition)
- streaming
- Streaming support for incremental PDF processing
- structure
- Document structure elements including page trees, name trees, and outlines according to ISO 32000-1
- templates
- PDF Template System with Variable Substitution
- text
- verification
- PDF Verification Module
- viewer_
preferences - Viewer preferences control how the PDF document is displayed in the viewer
- writer
- PDF writing functionality
Constants§
- VERSION
- Current version of oxidize-pdf