Function parse_structure_tree

Source

pub fn parse_structure_tree(
    document: &mut PdfDocument,
) -> Result<Option<StructTreeRoot>, Error>

Expand description

Parse the structure tree from a PDF document.

Reads the StructTreeRoot from the document catalog and recursively parses all structure elements. Uses a time budget to avoid spending seconds on documents with very large structure trees (50K+ elements). When the budget is exceeded, returns Ok(None) so the caller falls back to content-stream order (extract_spans).

§Arguments

document - The PDF document

§Returns

Ok(Some(StructTreeRoot)) - If the document has a structure tree and it parsed in time
Ok(None) - If the document is not tagged or the tree is too large to parse in budget
Err(Error) - If parsing fails

parse_structure_tree

Function parse_structure_tree Copy item path

§Arguments

§Returns

Function parse_structure_tree