Skip to main content

parse_content_stream_text_only

Function parse_content_stream_text_only 

Source
pub fn parse_content_stream_text_only(data: &[u8]) -> Result<Vec<Operator>>
Expand description

Parse a content stream for text extraction, skipping pure graphics operators.

This is a performance-optimized variant of parse_content_stream that avoids constructing Object operands for operators that only affect paths, clipping, and non-text graphics state. Inside BT/ET text blocks, parsing is identical to the full parser.

§Performance

For graphics-heavy pages (e.g., 1–12 MB of path data), this can be 3–5x faster than full parsing while producing identical text extraction results. The speedup comes from byte-level operand skipping (no f64 parsing, no heap allocation) and discarding path/clipping operators entirely.

§Safety limits

Same as parse_content_stream: bails out after [MAX_OPERATORS] operators or [MAX_CONSECUTIVE_ERRORS] consecutive parse failures.