Skip to main content

parse_structure_tree

Function parse_structure_tree 

Source
pub fn parse_structure_tree(
    document: &mut PdfDocument,
) -> Result<Option<StructTreeRoot>, Error>
Expand description

Parse the structure tree from a PDF document.

Reads the StructTreeRoot from the document catalog and recursively parses all structure elements. Uses a time budget to avoid spending seconds on documents with very large structure trees (50K+ elements). When the budget is exceeded, returns Ok(None) so the caller falls back to content-stream order (extract_spans).

§Arguments

  • document - The PDF document

§Returns

  • Ok(Some(StructTreeRoot)) - If the document has a structure tree and it parsed in time
  • Ok(None) - If the document is not tagged or the tree is too large to parse in budget
  • Err(Error) - If parsing fails