Skip to main content

Module parser

Module parser 

Source
Expand description

Content stream parser.

This module parses PDF content streams into a sequence of operators. Content streams are fundamentally different from the main PDF structure: they use a postfix notation where operands come before operators.

Example content stream:

BT
  /F1 12 Tf
  100 700 Td
  (Hello, World!) Tj
ET

Functions§

parse_and_execute_text_only
Streaming text-only parser: parse operators and call handler immediately.
parse_content_stream
Parse a content stream into a sequence of operators.
parse_content_stream_images_only
Image-only content stream parser: skips BT/ET text blocks entirely.
parse_content_stream_paths_only
Parse a content stream for path extraction, skipping BT/ET text blocks.
parse_content_stream_text_only
Parse a content stream for text extraction, skipping pure graphics operators.