Expand description
PDF Content Stream Parser - Complete support for PDF graphics operators
This module implements comprehensive parsing of PDF content streams according to the PDF specification. Content streams contain the actual drawing instructions (operators) that render text, graphics, and images on PDF pages.
§Overview
Content streams are sequences of PDF operators that describe:
- Text positioning and rendering
- Path construction and painting
- Color and graphics state management
- Image and XObject placement
- Coordinate transformations
§Architecture
The parser is divided into two main components:
ContentTokenizer: Low-level tokenization of content stream bytesContentParser: High-level parsing of tokens into structured operations
§Example
use oxidize_pdf::parser::content::{ContentParser, ContentOperation};
// Parse a content stream
let content_stream = b"BT /F1 12 Tf 100 200 Td (Hello World) Tj ET";
let operations = ContentParser::parse_content(content_stream)?;
// Process operations
for op in operations {
match op {
ContentOperation::BeginText => println!("Start text object"),
ContentOperation::SetFont(name, size) => println!("Font: {} at {}", name, size),
ContentOperation::ShowText(text) => println!("Text: {:?}", text),
_ => {}
}
}§Supported Operators
This parser supports all standard PDF operators including:
- Text operators (BT, ET, Tj, TJ, Tf, Td, etc.)
- Graphics state operators (q, Q, cm, w, J, etc.)
- Path construction operators (m, l, c, re, h)
- Path painting operators (S, f, B, n, etc.)
- Color operators (g, rg, k, cs, scn, etc.)
- XObject operators (Do)
- Marked content operators (BMC, BDC, EMC, etc.)
Structs§
- Content
Parser - High-level content stream parser.
- Content
Tokenizer - Content stream tokenizer
Enums§
- Content
Operation - Represents a single operator in a PDF content stream.
- Text
Element - Represents a text element in a TJ array for ShowTextArray operations.