Expand description
PDF Document wrapper - High-level interface for PDF parsing and manipulation
This module provides a robust, high-level interface for working with PDF documents. It solves Rust’s borrow checker challenges through careful use of interior mutability (RefCell) and separation of concerns between parsing, caching, and page access.
§Architecture
The module uses a layered architecture:
- PdfDocument: Main entry point with RefCell-based state management
- ResourceManager: Centralized object caching with interior mutability
- PdfReader: Low-level file access (wrapped in RefCell)
- PageTree: Lazy-loaded page navigation
§Key Features
- Automatic caching: Objects are cached after first access
- Resource management: Shared resources are handled efficiently
- Page navigation: Fast access to any page in the document
- Reference resolution: Automatic resolution of indirect references
- Text extraction: Built-in support for extracting text from pages
§Example
use oxidize_pdf::parser::{PdfDocument, PdfReader};
// Open a PDF document
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
// Get document information
let page_count = document.page_count()?;
let metadata = document.metadata()?;
println!("Title: {:?}", metadata.title);
println!("Pages: {}", page_count);
// Access a specific page
let page = document.get_page(0)?;
println!("Page size: {}x{}", page.width(), page.height());
// Extract text from all pages
let extracted_text = document.extract_text()?;
for (i, page_text) in extracted_text.iter().enumerate() {
println!("Page {}: {}", i + 1, page_text.text);
}Structs§
- PdfDocument
- High-level PDF document interface for parsing and manipulation.
- Resource
Manager - Resource manager for efficient PDF object caching.