Module document

Expand description

PDF Document wrapper - High-level interface for PDF parsing and manipulation

This module provides a robust, high-level interface for working with PDF documents. It solves Rust’s borrow checker challenges through careful use of interior mutability (RefCell) and separation of concerns between parsing, caching, and page access.

§Architecture

The module uses a layered architecture:

PdfDocument: Main entry point with RefCell-based state management
ResourceManager: Centralized object caching with interior mutability
PdfReader: Low-level file access (wrapped in RefCell)
PageTree: Lazy-loaded page navigation

§Key Features

Automatic caching: Objects are cached after first access
Resource management: Shared resources are handled efficiently
Page navigation: Fast access to any page in the document
Reference resolution: Automatic resolution of indirect references
Text extraction: Built-in support for extracting text from pages

§Example

use oxidize_pdf::parser::{PdfDocument, PdfReader};

// Open a PDF document
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);

// Get document information
let page_count = document.page_count()?;
let metadata = document.metadata()?;
println!("Title: {:?}", metadata.title);
println!("Pages: {}", page_count);

// Access a specific page
let page = document.get_page(0)?;
println!("Page size: {}x{}", page.width(), page.height());

// Extract text from all pages
let extracted_text = document.extract_text()?;
for (i, page_text) in extracted_text.iter().enumerate() {
    println!("Page {}: {}", i + 1, page_text.text);
}

Structs§

PdfDocument: High-level PDF document interface for parsing and manipulation.
ResourceManager: Resource manager for efficient PDF object caching.

Module document

Module document Copy item path

§Architecture

§Key Features

§Example

Structs§

Module document