Expand description
PDF parsing backend and content stream interpreter for pdfplumber-rs.
This crate implements Layer 1 (PDF parsing via pluggable backends) and Layer 2 (content stream interpretation) of the pdfplumber-rs architecture. It depends on pdfplumber-core for shared data types.
§Key types
PdfBackend— Trait for pluggable PDF parsing backendsLopdfBackend— Default backend using thelopdfcrateContentHandler— Trait for receiving events from content stream interpretationTextState— PDF text state machine (fonts, matrices, positioning)CMap— Character code to Unicode mapping (ToUnicode CMaps)FontMetrics— Font width metrics for character positioning
Re-exports§
pub use backend::PdfBackend;pub use char_extraction::char_from_event;pub use cid_font::CidFontMetrics;pub use cid_font::CidFontType;pub use cid_font::CidSystemInfo;pub use cid_font::CidToGidMap;pub use cid_font::PredefinedCMapInfo;pub use cid_font::extract_cid_font_metrics;pub use cid_font::get_descendant_font;pub use cid_font::get_type0_encoding;pub use cid_font::is_subset_font;pub use cid_font::is_type0_font;pub use cid_font::parse_predefined_cmap_name;pub use cid_font::parse_w_array;pub use cid_font::strip_subset_prefix;pub use cmap::CMap;pub use cmap::CidCMap;pub use error::BackendError;pub use font_metrics::FontMetrics;pub use font_metrics::extract_font_metrics;pub use handler::CharEvent;pub use handler::ContentHandler;pub use handler::ImageEvent;pub use handler::PaintOp;pub use handler::PathEvent;pub use interpreter_state::InterpreterState;pub use lopdf_backend::LopdfBackend;pub use lopdf_backend::LopdfDocument;pub use lopdf_backend::LopdfPage;pub use page_geometry::PageGeometry;pub use text_renderer::RawChar;pub use text_renderer::TjElement;pub use text_renderer::double_quote_show_string;pub use text_renderer::quote_show_string;pub use text_renderer::show_string;pub use text_renderer::show_string_cid;pub use text_renderer::show_string_with_positioning;pub use text_renderer::show_string_with_positioning_mode;pub use text_state::TextRenderMode;pub use text_state::TextState;pub use tokenizer::Operand;pub use tokenizer::Operator;pub use tokenizer::tokenize;pub use pdfplumber_core;
Modules§
- backend
- PDF parsing backend trait.
- char_
extraction - Character bounding box calculation from content stream events.
- cid_
font - CID font support for CJK text extraction.
- cmap
- ToUnicode CMap parser for mapping character codes to Unicode strings.
- error
- Error types for the parsing and interpreter layers.
- font_
metrics - Font metrics extraction from PDF font dictionaries.
- handler
- Content handler callback trait for content stream interpretation.
- interpreter
- Content stream interpreter.
- interpreter_
state - Graphics state stack for the content stream interpreter.
- lopdf_
backend - lopdf-based PDF parsing backend.
- page_
geometry - Page coordinate normalization — rotation and CropBox transforms.
- text_
renderer - Text rendering operators (Tj, TJ, ’, “) for the content stream interpreter.
- text_
state - Text state machine for the content stream interpreter.
- tokenizer
- Content stream tokenizer for PDF operator/operand parsing.