Module document

Module document 

Source
Expand description

PDF Document wrapper - High-level interface for PDF parsing and manipulation

This module provides a robust, high-level interface for working with PDF documents. It solves Rust’s borrow checker challenges through careful use of interior mutability (RefCell) and separation of concerns between parsing, caching, and page access.

§Architecture

The module uses a layered architecture:

  • PdfDocument: Main entry point with RefCell-based state management
  • ResourceManager: Centralized object caching with interior mutability
  • PdfReader: Low-level file access (wrapped in RefCell)
  • PageTree: Lazy-loaded page navigation

§Key Features

  • Automatic caching: Objects are cached after first access
  • Resource management: Shared resources are handled efficiently
  • Page navigation: Fast access to any page in the document
  • Reference resolution: Automatic resolution of indirect references
  • Text extraction: Built-in support for extracting text from pages

§Example

use oxidize_pdf::parser::{PdfDocument, PdfReader};

// Open a PDF document
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);

// Get document information
let page_count = document.page_count()?;
let metadata = document.metadata()?;
println!("Title: {:?}", metadata.title);
println!("Pages: {}", page_count);

// Access a specific page
let page = document.get_page(0)?;
println!("Page size: {}x{}", page.width(), page.height());

// Extract text from all pages
let extracted_text = document.extract_text()?;
for (i, page_text) in extracted_text.iter().enumerate() {
    println!("Page {}: {}", i + 1, page_text.text);
}

Structs§

PdfDocument
High-level PDF document interface for parsing and manipulation.
ResourceManager
Resource manager for efficient PDF object caching.