Struct ParsedPage

Source

pub struct ParsedPage {
    pub obj_ref: (u32, u16),
    pub dict: PdfDictionary,
    pub inherited_resources: Option<PdfDictionary>,
    pub media_box: [f64; 4],
    pub crop_box: Option<[f64; 4]>,
    pub rotation: i32,
    pub annotations: Option<PdfArray>,
}

Expand description

Represents a single page in the PDF with all its properties and resources.

A ParsedPage contains all the information needed to render or analyze a PDF page, including its dimensions, content streams, resources, and inherited properties from parent page tree nodes.

§Fields

obj_ref - Object reference (object number, generation number) pointing to this page in the PDF
dict - Complete page dictionary containing all page-specific entries
inherited_resources - Resources inherited from parent page tree nodes
media_box - Page dimensions in PDF units [llx, lly, urx, ury]
crop_box - Optional visible area of the page
rotation - Page rotation in degrees (0, 90, 180, or 270)

§Example

use oxidize_pdf::parser::{PdfDocument, PdfReader};

let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Access page properties
let (obj_num, gen_num) = page.obj_ref;
println!("Page object: {} {} R", obj_num, gen_num);

// Get page dimensions
let [llx, lly, urx, ury] = page.media_box;
println!("MediaBox: ({}, {}) to ({}, {})", llx, lly, urx, ury);

// Check for content
if let Some(contents) = page.dict.get("Contents") {
    println!("Page has content streams");
}

Fields§

§obj_ref: (u32, u16)

Object reference to this page in the form (object_number, generation_number). This uniquely identifies the page object in the PDF file.

§dict: PdfDictionary

Page dictionary containing all page-specific entries like Contents, Resources, etc. This is the raw PDF dictionary for the page object.

§inherited_resources: Option<PdfDictionary>

Resources inherited from parent page tree nodes. These are automatically merged during page tree traversal.

§media_box: [f64; 4]

MediaBox defining the page dimensions in PDF units (typically points). Format: [lower_left_x, lower_left_y, upper_right_x, upper_right_y]

§crop_box: Option<[f64; 4]>

CropBox defining the visible area of the page. If None, the entire MediaBox is visible.

§rotation: i32

Page rotation in degrees. Valid values are 0, 90, 180, or 270. The rotation is applied clockwise.

§annotations: Option<PdfArray>

Annotations array containing references to annotation objects. This is parsed from the page’s /Annots entry.

Implementations§

Source §

impl ParsedPage

Source

pub fn width(&self) -> f64

Get the effective page width accounting for rotation.

The width is calculated from the MediaBox and adjusted based on the page rotation. For 90° or 270° rotations, the width and height are swapped.

§Returns

The page width in PDF units (typically points, where 1 point = 1/72 inch)

§Example

let page = document.get_page(0)?;
let width_pts = page.width();
let width_inches = width_pts / 72.0;
let width_mm = width_pts * 25.4 / 72.0;
println!("Page width: {} points ({:.2} inches, {:.2} mm)", width_pts, width_inches, width_mm);

Source

pub fn height(&self) -> f64

Get the effective page height accounting for rotation.

The height is calculated from the MediaBox and adjusted based on the page rotation. For 90° or 270° rotations, the width and height are swapped.

§Returns

The page height in PDF units (typically points, where 1 point = 1/72 inch)

§Example

let page = document.get_page(0)?;
println!("Page dimensions: {}x{} points", page.width(), page.height());
if page.rotation != 0 {
    println!("Page is rotated {} degrees", page.rotation);
}

Source

pub fn content_streams<R: Read + Seek>( &self, reader: &mut PdfReader<R>, ) -> ParseResult<Vec<Vec<u8>>>

Get the content streams for this page using a PdfReader.

Content streams contain the actual drawing instructions (operators) that render text, graphics, and images on the page. A page may have multiple content streams which are concatenated during rendering.

§Arguments

reader - Mutable reference to the PDF reader

§Returns

A vector of decompressed content stream data. Each vector contains the raw bytes of a content stream ready for parsing.

§Errors

Returns an error if:

The Contents entry is malformed
Stream decompression fails
Referenced objects cannot be resolved

§Example

let streams = page.content_streams(reader)?;
for (i, stream) in streams.iter().enumerate() {
    println!("Content stream {}: {} bytes", i, stream.len());
}

Source

pub fn content_streams_with_document<R: Read + Seek>( &self, document: &PdfDocument<R>, ) -> ParseResult<Vec<Vec<u8>>>

Get content streams using PdfDocument (recommended method).

This is the preferred method for accessing content streams as it uses the document’s caching and resource management capabilities.

§Arguments

document - Reference to the PDF document

§Returns

A vector of decompressed content stream data ready for parsing with ContentParser.

§Example

let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Get content streams
let streams = page.content_streams_with_document(&document)?;

// Parse each stream
for stream_data in streams {
    let operations = ContentParser::parse_content(&stream_data)?;
    println!("Stream has {} operations", operations.len());
}

Source

pub fn get_contents(&self) -> Option<&PdfObject>

Get the effective resources for this page (including inherited).

Resources include fonts, images (XObjects), color spaces, patterns, and other assets needed to render the page. This method returns page-specific resources if present, otherwise falls back to inherited resources from parent nodes.

§Returns

The Resources dictionary if available, or None if the page has no resources.

§Resource Categories

The Resources dictionary may contain:

Font - Font definitions used by text operators
XObject - External objects (images, form XObjects)
ColorSpace - Color space definitions
Pattern - Pattern definitions for fills
Shading - Shading dictionaries
ExtGState - Graphics state parameter dictionaries
Properties - Property list dictionaries

§Example

if let Some(resources) = page.get_resources() {
    // Check for fonts
    if let Some(fonts) = resources.get("Font").and_then(|f| f.as_dict()) {
        println!("Page uses {} fonts", fonts.0.len());
    }
     
    // Check for images
    if let Some(xobjects) = resources.get("XObject").and_then(|x| x.as_dict()) {
        println!("Page has {} XObjects", xobjects.0.len());
    }
}

Source

pub fn get_resources(&self) -> Option<&PdfDictionary>

Source

pub fn clone_with_resources(&self) -> Self

Clone this page with all inherited resources merged into the page dictionary.

This is useful when extracting a page for separate processing or when you need a self-contained page object with all resources explicitly included.

§Returns

A cloned page with inherited resources merged into the Resources entry of the page dictionary.

§Example

// Get a self-contained page with all resources
let standalone_page = page.clone_with_resources();

// The cloned page now has all resources in its dictionary
assert!(standalone_page.dict.contains_key("Resources"));

Source

pub fn get_annotations(&self) -> Option<&PdfArray>

Get the annotations array for this page.

Returns a reference to the annotations array if present. Each element in the array is typically a reference to an annotation dictionary.

§Example

if let Some(annots) = page.get_annotations() {
    println!("Page has {} annotations", annots.len());
}

Source

pub fn has_annotations(&self) -> bool

Check if the page has annotations.

§Returns

true if the page has an annotations array with at least one annotation, false otherwise.

§Example

if page.has_annotations() {
    println!("This page contains annotations");
}

Source

pub fn get_referenced_objects<R: Read + Seek>( &self, reader: &mut PdfReader<R>, ) -> ParseResult<HashMap<(u32, u16), PdfObject>>

Get all objects referenced by this page (for extraction or analysis).

This method recursively collects all objects referenced by the page, including:

Content streams
Resources (fonts, images, etc.)
Nested objects within resources

This is useful for extracting a complete page with all its dependencies or for analyzing the object graph of a page.

§Arguments

reader - Mutable reference to the PDF reader

§Returns

A HashMap mapping object references (obj_num, gen_num) to their resolved objects.

§Example

let referenced_objects = page.get_referenced_objects(reader)?;

println!("Page references {} objects", referenced_objects.len());
for ((obj_num, gen_num), obj) in &referenced_objects {
    println!("  {} {} R: {:?}", obj_num, gen_num, obj);
}