ParsedPage

Struct ParsedPage 

Source
pub struct ParsedPage {
    pub obj_ref: (u32, u16),
    pub dict: PdfDictionary,
    pub inherited_resources: Option<PdfDictionary>,
    pub media_box: [f64; 4],
    pub crop_box: Option<[f64; 4]>,
    pub rotation: i32,
    pub annotations: Option<PdfArray>,
}
Expand description

Represents a single page in the PDF with all its properties and resources.

A ParsedPage contains all the information needed to render or analyze a PDF page, including its dimensions, content streams, resources, and inherited properties from parent page tree nodes.

§Fields

  • obj_ref - Object reference (object number, generation number) pointing to this page in the PDF
  • dict - Complete page dictionary containing all page-specific entries
  • inherited_resources - Resources inherited from parent page tree nodes
  • media_box - Page dimensions in PDF units [llx, lly, urx, ury]
  • crop_box - Optional visible area of the page
  • rotation - Page rotation in degrees (0, 90, 180, or 270)

§Example

use oxidize_pdf::parser::{PdfDocument, PdfReader};

let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Access page properties
let (obj_num, gen_num) = page.obj_ref;
println!("Page object: {} {} R", obj_num, gen_num);

// Get page dimensions
let [llx, lly, urx, ury] = page.media_box;
println!("MediaBox: ({}, {}) to ({}, {})", llx, lly, urx, ury);

// Check for content
if let Some(contents) = page.dict.get("Contents") {
    println!("Page has content streams");
}

Fields§

§obj_ref: (u32, u16)

Object reference to this page in the form (object_number, generation_number). This uniquely identifies the page object in the PDF file.

§dict: PdfDictionary

Page dictionary containing all page-specific entries like Contents, Resources, etc. This is the raw PDF dictionary for the page object.

§inherited_resources: Option<PdfDictionary>

Resources inherited from parent page tree nodes. These are automatically merged during page tree traversal.

§media_box: [f64; 4]

MediaBox defining the page dimensions in PDF units (typically points). Format: [lower_left_x, lower_left_y, upper_right_x, upper_right_y]

§crop_box: Option<[f64; 4]>

CropBox defining the visible area of the page. If None, the entire MediaBox is visible.

§rotation: i32

Page rotation in degrees. Valid values are 0, 90, 180, or 270. The rotation is applied clockwise.

§annotations: Option<PdfArray>

Annotations array containing references to annotation objects. This is parsed from the page’s /Annots entry.

Implementations§

Source§

impl ParsedPage

Source

pub fn width(&self) -> f64

Get the effective page width accounting for rotation.

The width is calculated from the MediaBox and adjusted based on the page rotation. For 90° or 270° rotations, the width and height are swapped.

§Returns

The page width in PDF units (typically points, where 1 point = 1/72 inch)

§Example
let page = document.get_page(0)?;
let width_pts = page.width();
let width_inches = width_pts / 72.0;
let width_mm = width_pts * 25.4 / 72.0;
println!("Page width: {} points ({:.2} inches, {:.2} mm)", width_pts, width_inches, width_mm);
Source

pub fn height(&self) -> f64

Get the effective page height accounting for rotation.

The height is calculated from the MediaBox and adjusted based on the page rotation. For 90° or 270° rotations, the width and height are swapped.

§Returns

The page height in PDF units (typically points, where 1 point = 1/72 inch)

§Example
let page = document.get_page(0)?;
println!("Page dimensions: {}x{} points", page.width(), page.height());
if page.rotation != 0 {
    println!("Page is rotated {} degrees", page.rotation);
}
Source

pub fn content_streams<R: Read + Seek>( &self, reader: &mut PdfReader<R>, ) -> ParseResult<Vec<Vec<u8>>>

Get the content streams for this page using a PdfReader.

Content streams contain the actual drawing instructions (operators) that render text, graphics, and images on the page. A page may have multiple content streams which are concatenated during rendering.

§Arguments
  • reader - Mutable reference to the PDF reader
§Returns

A vector of decompressed content stream data. Each vector contains the raw bytes of a content stream ready for parsing.

§Errors

Returns an error if:

  • The Contents entry is malformed
  • Stream decompression fails
  • Referenced objects cannot be resolved
§Example
let streams = page.content_streams(reader)?;
for (i, stream) in streams.iter().enumerate() {
    println!("Content stream {}: {} bytes", i, stream.len());
}
Source

pub fn content_streams_with_document<R: Read + Seek>( &self, document: &PdfDocument<R>, ) -> ParseResult<Vec<Vec<u8>>>

Get content streams using PdfDocument (recommended method).

This is the preferred method for accessing content streams as it uses the document’s caching and resource management capabilities.

§Arguments
  • document - Reference to the PDF document
§Returns

A vector of decompressed content stream data ready for parsing with ContentParser.

§Example
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;

// Get content streams
let streams = page.content_streams_with_document(&document)?;

// Parse each stream
for stream_data in streams {
    let operations = ContentParser::parse_content(&stream_data)?;
    println!("Stream has {} operations", operations.len());
}
Source

pub fn get_contents(&self) -> Option<&PdfObject>

Get the effective resources for this page (including inherited).

Resources include fonts, images (XObjects), color spaces, patterns, and other assets needed to render the page. This method returns page-specific resources if present, otherwise falls back to inherited resources from parent nodes.

§Returns

The Resources dictionary if available, or None if the page has no resources.

§Resource Categories

The Resources dictionary may contain:

  • Font - Font definitions used by text operators
  • XObject - External objects (images, form XObjects)
  • ColorSpace - Color space definitions
  • Pattern - Pattern definitions for fills
  • Shading - Shading dictionaries
  • ExtGState - Graphics state parameter dictionaries
  • Properties - Property list dictionaries
§Example
if let Some(resources) = page.get_resources() {
    // Check for fonts
    if let Some(fonts) = resources.get("Font").and_then(|f| f.as_dict()) {
        println!("Page uses {} fonts", fonts.0.len());
    }
     
    // Check for images
    if let Some(xobjects) = resources.get("XObject").and_then(|x| x.as_dict()) {
        println!("Page has {} XObjects", xobjects.0.len());
    }
}
Source

pub fn get_resources(&self) -> Option<&PdfDictionary>

Source

pub fn clone_with_resources(&self) -> Self

Clone this page with all inherited resources merged into the page dictionary.

This is useful when extracting a page for separate processing or when you need a self-contained page object with all resources explicitly included.

§Returns

A cloned page with inherited resources merged into the Resources entry of the page dictionary.

§Example
// Get a self-contained page with all resources
let standalone_page = page.clone_with_resources();

// The cloned page now has all resources in its dictionary
assert!(standalone_page.dict.contains_key("Resources"));
Source

pub fn get_annotations(&self) -> Option<&PdfArray>

Get the annotations array for this page.

Returns a reference to the annotations array if present. Each element in the array is typically a reference to an annotation dictionary.

§Example
if let Some(annots) = page.get_annotations() {
    println!("Page has {} annotations", annots.len());
}
Source

pub fn has_annotations(&self) -> bool

Check if the page has annotations.

§Returns

true if the page has an annotations array with at least one annotation, false otherwise.

§Example
if page.has_annotations() {
    println!("This page contains annotations");
}
Source

pub fn get_referenced_objects<R: Read + Seek>( &self, reader: &mut PdfReader<R>, ) -> ParseResult<HashMap<(u32, u16), PdfObject>>

Get all objects referenced by this page (for extraction or analysis).

This method recursively collects all objects referenced by the page, including:

  • Content streams
  • Resources (fonts, images, etc.)
  • Nested objects within resources

This is useful for extracting a complete page with all its dependencies or for analyzing the object graph of a page.

§Arguments
  • reader - Mutable reference to the PDF reader
§Returns

A HashMap mapping object references (obj_num, gen_num) to their resolved objects.

§Example
let referenced_objects = page.get_referenced_objects(reader)?;

println!("Page references {} objects", referenced_objects.len());
for ((obj_num, gen_num), obj) in &referenced_objects {
    println!("  {} {} R: {:?}", obj_num, gen_num, obj);
}

Trait Implementations§

Source§

impl Clone for ParsedPage

Source§

fn clone(&self) -> ParsedPage

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ParsedPage

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more