Skip to main content

Crate pdfplumber_parse

Crate pdfplumber_parse 

Source
Expand description

PDF parsing backend and content stream interpreter for pdfplumber-rs.

This crate implements Layer 1 (PDF parsing via pluggable backends) and Layer 2 (content stream interpretation) of the pdfplumber-rs architecture. It depends on pdfplumber-core for shared data types.

§Key types

  • PdfBackend — Trait for pluggable PDF parsing backends
  • LopdfBackend — Default backend using the lopdf crate
  • ContentHandler — Trait for receiving events from content stream interpretation
  • TextState — PDF text state machine (fonts, matrices, positioning)
  • CMap — Character code to Unicode mapping (ToUnicode CMaps)
  • FontMetrics — Font width metrics for character positioning

Re-exports§

pub use backend::PdfBackend;
pub use char_extraction::char_from_event;
pub use cid_font::CidFontMetrics;
pub use cid_font::CidFontType;
pub use cid_font::CidSystemInfo;
pub use cid_font::CidToGidMap;
pub use cid_font::PredefinedCMapInfo;
pub use cid_font::extract_cid_font_metrics;
pub use cid_font::get_descendant_font;
pub use cid_font::get_type0_encoding;
pub use cid_font::is_subset_font;
pub use cid_font::is_type0_font;
pub use cid_font::parse_predefined_cmap_name;
pub use cid_font::parse_w_array;
pub use cid_font::strip_subset_prefix;
pub use cmap::CMap;
pub use cmap::CidCMap;
pub use error::BackendError;
pub use font_metrics::FontMetrics;
pub use font_metrics::extract_font_metrics;
pub use handler::CharEvent;
pub use handler::ContentHandler;
pub use handler::ImageEvent;
pub use handler::PaintOp;
pub use handler::PathEvent;
pub use interpreter_state::InterpreterState;
pub use lopdf_backend::LopdfBackend;
pub use lopdf_backend::LopdfDocument;
pub use lopdf_backend::LopdfPage;
pub use page_geometry::PageGeometry;
pub use text_renderer::RawChar;
pub use text_renderer::TjElement;
pub use text_renderer::double_quote_show_string;
pub use text_renderer::quote_show_string;
pub use text_renderer::show_string;
pub use text_renderer::show_string_cid;
pub use text_renderer::show_string_with_positioning;
pub use text_renderer::show_string_with_positioning_mode;
pub use text_state::TextRenderMode;
pub use text_state::TextState;
pub use tokenizer::Operand;
pub use tokenizer::Operator;
pub use tokenizer::tokenize;
pub use pdfplumber_core;

Modules§

backend
PDF parsing backend trait.
char_extraction
Character bounding box calculation from content stream events.
cid_font
CID font support for CJK text extraction.
cmap
ToUnicode CMap parser for mapping character codes to Unicode strings.
error
Error types for the parsing and interpreter layers.
font_metrics
Font metrics extraction from PDF font dictionaries.
handler
Content handler callback trait for content stream interpretation.
interpreter
Content stream interpreter.
interpreter_state
Graphics state stack for the content stream interpreter.
lopdf_backend
lopdf-based PDF parsing backend.
page_geometry
Page coordinate normalization — rotation and CropBox transforms.
text_renderer
Text rendering operators (Tj, TJ, ’, “) for the content stream interpreter.
text_state
Text state machine for the content stream interpreter.
tokenizer
Content stream tokenizer for PDF operator/operand parsing.