pub fn parse_content_stream_text_only(data: &[u8]) -> Result<Vec<Operator>>Expand description
Parse a content stream for text extraction, skipping pure graphics operators.
This is a performance-optimized variant of parse_content_stream that
avoids constructing Object operands for operators that only affect paths,
clipping, and non-text graphics state. Inside BT/ET text blocks, parsing is
identical to the full parser.
§Performance
For graphics-heavy pages (e.g., 1–12 MB of path data), this can be 3–5x
faster than full parsing while producing identical text extraction results.
The speedup comes from byte-level operand skipping (no f64 parsing, no
heap allocation) and discarding path/clipping operators entirely.
§Safety limits
Same as parse_content_stream: bails out after [MAX_OPERATORS]
operators or [MAX_CONSECUTIVE_ERRORS] consecutive parse failures.