pub struct ParsedPage {
pub obj_ref: (u32, u16),
pub dict: PdfDictionary,
pub inherited_resources: Option<PdfDictionary>,
pub media_box: [f64; 4],
pub crop_box: Option<[f64; 4]>,
pub rotation: i32,
pub annotations: Option<PdfArray>,
}
Expand description
Represents a single page in the PDF with all its properties and resources.
A ParsedPage
contains all the information needed to render or analyze a PDF page,
including its dimensions, content streams, resources, and inherited properties from
parent page tree nodes.
§Fields
obj_ref
- Object reference (object number, generation number) pointing to this page in the PDFdict
- Complete page dictionary containing all page-specific entriesinherited_resources
- Resources inherited from parent page tree nodesmedia_box
- Page dimensions in PDF units [llx, lly, urx, ury]crop_box
- Optional visible area of the pagerotation
- Page rotation in degrees (0, 90, 180, or 270)
§Example
use oxidize_pdf::parser::{PdfDocument, PdfReader};
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;
// Access page properties
let (obj_num, gen_num) = page.obj_ref;
println!("Page object: {} {} R", obj_num, gen_num);
// Get page dimensions
let [llx, lly, urx, ury] = page.media_box;
println!("MediaBox: ({}, {}) to ({}, {})", llx, lly, urx, ury);
// Check for content
if let Some(contents) = page.dict.get("Contents") {
println!("Page has content streams");
}
Fields§
§obj_ref: (u32, u16)
Object reference to this page in the form (object_number, generation_number). This uniquely identifies the page object in the PDF file.
dict: PdfDictionary
Page dictionary containing all page-specific entries like Contents, Resources, etc. This is the raw PDF dictionary for the page object.
inherited_resources: Option<PdfDictionary>
Resources inherited from parent page tree nodes. These are automatically merged during page tree traversal.
media_box: [f64; 4]
MediaBox defining the page dimensions in PDF units (typically points). Format: [lower_left_x, lower_left_y, upper_right_x, upper_right_y]
crop_box: Option<[f64; 4]>
CropBox defining the visible area of the page. If None, the entire MediaBox is visible.
rotation: i32
Page rotation in degrees. Valid values are 0, 90, 180, or 270. The rotation is applied clockwise.
annotations: Option<PdfArray>
Annotations array containing references to annotation objects. This is parsed from the page’s /Annots entry.
Implementations§
Source§impl ParsedPage
impl ParsedPage
Sourcepub fn width(&self) -> f64
pub fn width(&self) -> f64
Get the effective page width accounting for rotation.
The width is calculated from the MediaBox and adjusted based on the page rotation. For 90° or 270° rotations, the width and height are swapped.
§Returns
The page width in PDF units (typically points, where 1 point = 1/72 inch)
§Example
let page = document.get_page(0)?;
let width_pts = page.width();
let width_inches = width_pts / 72.0;
let width_mm = width_pts * 25.4 / 72.0;
println!("Page width: {} points ({:.2} inches, {:.2} mm)", width_pts, width_inches, width_mm);
Sourcepub fn height(&self) -> f64
pub fn height(&self) -> f64
Get the effective page height accounting for rotation.
The height is calculated from the MediaBox and adjusted based on the page rotation. For 90° or 270° rotations, the width and height are swapped.
§Returns
The page height in PDF units (typically points, where 1 point = 1/72 inch)
§Example
let page = document.get_page(0)?;
println!("Page dimensions: {}x{} points", page.width(), page.height());
if page.rotation != 0 {
println!("Page is rotated {} degrees", page.rotation);
}
Sourcepub fn content_streams<R: Read + Seek>(
&self,
reader: &mut PdfReader<R>,
) -> ParseResult<Vec<Vec<u8>>>
pub fn content_streams<R: Read + Seek>( &self, reader: &mut PdfReader<R>, ) -> ParseResult<Vec<Vec<u8>>>
Get the content streams for this page using a PdfReader.
Content streams contain the actual drawing instructions (operators) that render text, graphics, and images on the page. A page may have multiple content streams which are concatenated during rendering.
§Arguments
reader
- Mutable reference to the PDF reader
§Returns
A vector of decompressed content stream data. Each vector contains the raw bytes of a content stream ready for parsing.
§Errors
Returns an error if:
- The Contents entry is malformed
- Stream decompression fails
- Referenced objects cannot be resolved
§Example
let streams = page.content_streams(reader)?;
for (i, stream) in streams.iter().enumerate() {
println!("Content stream {}: {} bytes", i, stream.len());
}
Sourcepub fn content_streams_with_document<R: Read + Seek>(
&self,
document: &PdfDocument<R>,
) -> ParseResult<Vec<Vec<u8>>>
pub fn content_streams_with_document<R: Read + Seek>( &self, document: &PdfDocument<R>, ) -> ParseResult<Vec<Vec<u8>>>
Get content streams using PdfDocument (recommended method).
This is the preferred method for accessing content streams as it uses the document’s caching and resource management capabilities.
§Arguments
document
- Reference to the PDF document
§Returns
A vector of decompressed content stream data ready for parsing with ContentParser
.
§Example
let reader = PdfReader::open("document.pdf")?;
let document = PdfDocument::new(reader);
let page = document.get_page(0)?;
// Get content streams
let streams = page.content_streams_with_document(&document)?;
// Parse each stream
for stream_data in streams {
let operations = ContentParser::parse_content(&stream_data)?;
println!("Stream has {} operations", operations.len());
}
Sourcepub fn get_contents(&self) -> Option<&PdfObject>
pub fn get_contents(&self) -> Option<&PdfObject>
Get the effective resources for this page (including inherited).
Resources include fonts, images (XObjects), color spaces, patterns, and other assets needed to render the page. This method returns page-specific resources if present, otherwise falls back to inherited resources from parent nodes.
§Returns
The Resources dictionary if available, or None if the page has no resources.
§Resource Categories
The Resources dictionary may contain:
Font
- Font definitions used by text operatorsXObject
- External objects (images, form XObjects)ColorSpace
- Color space definitionsPattern
- Pattern definitions for fillsShading
- Shading dictionariesExtGState
- Graphics state parameter dictionariesProperties
- Property list dictionaries
§Example
if let Some(resources) = page.get_resources() {
// Check for fonts
if let Some(fonts) = resources.get("Font").and_then(|f| f.as_dict()) {
println!("Page uses {} fonts", fonts.0.len());
}
// Check for images
if let Some(xobjects) = resources.get("XObject").and_then(|x| x.as_dict()) {
println!("Page has {} XObjects", xobjects.0.len());
}
}
pub fn get_resources(&self) -> Option<&PdfDictionary>
Sourcepub fn clone_with_resources(&self) -> Self
pub fn clone_with_resources(&self) -> Self
Clone this page with all inherited resources merged into the page dictionary.
This is useful when extracting a page for separate processing or when you need a self-contained page object with all resources explicitly included.
§Returns
A cloned page with inherited resources merged into the Resources entry of the page dictionary.
§Example
// Get a self-contained page with all resources
let standalone_page = page.clone_with_resources();
// The cloned page now has all resources in its dictionary
assert!(standalone_page.dict.contains_key("Resources"));
Sourcepub fn get_annotations(&self) -> Option<&PdfArray>
pub fn get_annotations(&self) -> Option<&PdfArray>
Get the annotations array for this page.
Returns a reference to the annotations array if present. Each element in the array is typically a reference to an annotation dictionary.
§Example
if let Some(annots) = page.get_annotations() {
println!("Page has {} annotations", annots.len());
}
Sourcepub fn has_annotations(&self) -> bool
pub fn has_annotations(&self) -> bool
Sourcepub fn get_referenced_objects<R: Read + Seek>(
&self,
reader: &mut PdfReader<R>,
) -> ParseResult<HashMap<(u32, u16), PdfObject>>
pub fn get_referenced_objects<R: Read + Seek>( &self, reader: &mut PdfReader<R>, ) -> ParseResult<HashMap<(u32, u16), PdfObject>>
Get all objects referenced by this page (for extraction or analysis).
This method recursively collects all objects referenced by the page, including:
- Content streams
- Resources (fonts, images, etc.)
- Nested objects within resources
This is useful for extracting a complete page with all its dependencies or for analyzing the object graph of a page.
§Arguments
reader
- Mutable reference to the PDF reader
§Returns
A HashMap mapping object references (obj_num, gen_num) to their resolved objects.
§Example
let referenced_objects = page.get_referenced_objects(reader)?;
println!("Page references {} objects", referenced_objects.len());
for ((obj_num, gen_num), obj) in &referenced_objects {
println!(" {} {} R: {:?}", obj_num, gen_num, obj);
}
Trait Implementations§
Source§impl Clone for ParsedPage
impl Clone for ParsedPage
Source§fn clone(&self) -> ParsedPage
fn clone(&self) -> ParsedPage
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more