Skip to main content

Crate pdfplumber_core

Crate pdfplumber_core 

Source
Expand description

Backend-independent data types and algorithms for pdfplumber-rs.

This crate provides the foundational types (BBox, Char, Word, Line, Rect, Table, etc.) and algorithms (text grouping, table detection) used by pdfplumber-rs. It has no required external dependencies — all functionality is pure Rust.

§Modules

Re-exports§

pub use annotation::Annotation;
pub use annotation::AnnotationType;
pub use bookmark::Bookmark;
pub use dedupe::DedupeOptions;
pub use dedupe::dedupe_chars;
pub use edges::Edge;
pub use edges::EdgeSource;
pub use edges::derive_edges;
pub use edges::edge_from_curve;
pub use edges::edge_from_line;
pub use edges::edges_from_rect;
pub use encoding::EncodingResolver;
pub use encoding::FontEncoding;
pub use encoding::StandardEncoding;
pub use error::ExtractOptions;
pub use error::ExtractResult;
pub use error::ExtractWarning;
pub use error::PdfError;
pub use form_field::FieldType;
pub use form_field::FormField;
pub use geometry::BBox;
pub use geometry::Ctm;
pub use geometry::Orientation;
pub use geometry::Point;
pub use html::HtmlOptions;
pub use html::HtmlRenderer;
pub use images::Image;
pub use images::ImageContent;
pub use images::ImageFormat;
pub use images::ImageMetadata;
pub use images::image_from_ctm;
pub use layout::TextBlock;
pub use layout::TextLine;
pub use layout::TextOptions;
pub use layout::blocks_to_text;
pub use layout::cluster_lines_into_blocks;
pub use layout::cluster_words_into_lines;
pub use layout::sort_blocks_reading_order;
pub use layout::split_lines_at_columns;
pub use layout::words_to_text;
pub use markdown::MarkdownOptions;
pub use markdown::MarkdownRenderer;
pub use metadata::DocumentMetadata;
pub use page_object::PageObject;
pub use painting::Color;
pub use painting::DashPattern;
pub use painting::ExtGState;
pub use painting::FillRule;
pub use painting::GraphicsState;
pub use painting::PaintedPath;
pub use path::Path;
pub use path::PathBuilder;
pub use path::PathSegment;
pub use repair::RepairOptions;
pub use repair::RepairResult;
pub use search::SearchMatch;
pub use search::SearchOptions;
pub use search::search_chars;
pub use shapes::Curve;
pub use shapes::Line;
pub use shapes::LineOrientation;
pub use shapes::Rect;
pub use shapes::extract_shapes;
pub use signature::SignatureInfo;
pub use struct_tree::StructElement;
pub use svg::DrawStyle;
pub use svg::SvgDebugOptions;
pub use svg::SvgOptions;
pub use svg::SvgRenderer;
pub use table::Cell;
pub use table::ExplicitLines;
pub use table::Intersection;
pub use table::Strategy;
pub use table::Table;
pub use table::TableFinder;
pub use table::TableFinderDebug;
pub use table::TableQuality;
pub use table::TableSettings;
pub use table::cells_to_tables;
pub use table::edges_to_intersections;
pub use table::explicit_lines_to_edges;
pub use table::extract_text_for_cells;
pub use table::intersections_to_cells;
pub use table::join_edge_group;
pub use table::snap_edges;
pub use table::words_to_edges_stream;
pub use text::Char;
pub use text::TextDirection;
pub use text::is_cjk;
pub use text::is_cjk_text;
pub use unicode_norm::UnicodeNorm;
pub use unicode_norm::normalize_chars;
pub use validation::Severity;
pub use validation::ValidationIssue;
pub use words::Word;
pub use words::WordExtractor;
pub use words::WordOptions;

Modules§

annotation
PDF annotation types. PDF annotation types.
bookmark
PDF bookmark / outline / table of contents types. PDF bookmark / outline / table of contents types.
dedupe
Duplicate character deduplication. Duplicate character deduplication.
edges
Edge derivation from geometric primitives for table detection. Edge derivation from geometric primitives.
encoding
Font encoding mapping (Standard, Windows, Mac, Custom). Standard PDF text encodings and encoding resolution.
error
Error and warning types for PDF processing. Error and warning types for pdfplumber-rs.
form_field
PDF form field types for AcroForm extraction. PDF form field types for AcroForm extraction.
geometry
Geometric primitives: Point, BBox, CTM, Orientation.
html
HTML rendering for PDF page content. HTML rendering for PDF page content.
hyperlink
PDF hyperlink types. PDF hyperlink types.
images
Image extraction and metadata. Image extraction from XObject Do operator.
layout
Text layout: words → lines → blocks, reading order, text output.
markdown
Markdown rendering for PDF page content. Markdown rendering for PDF page content.
metadata
Document-level metadata types. Document-level metadata types.
page_object
PageObject enum for custom object filtering. PageObject enum for custom filtering.
painting
Graphics state, colors, dash patterns, and painted paths. Path painting operators, graphics state, and ExtGState types.
path
PDF path construction (MoveTo, LineTo, CurveTo, ClosePath).
repair
PDF repair types for best-effort fixing of common PDF issues. PDF repair types for best-effort fixing of common PDF issues.
search
Text search with position — find text patterns and return matches with bounding boxes. Text search with position — find text patterns and return matches with bounding boxes.
shapes
Shape extraction: Lines, Rects, Curves from painted paths. Line and Rect extraction from painted paths.
signature
PDF digital signature information types. PDF digital signature information types.
struct_tree
PDF structure tree types for tagged PDF access. PDF structure tree types for tagged PDF access.
svg
SVG rendering for visual debugging of PDF pages. SVG rendering for visual debugging of PDF pages.
table
Table detection: lattice, stream, and explicit strategies. Table detection types and pipeline.
text
Character data types and CJK detection.
unicode_norm
Unicode normalization for extracted text. Unicode normalization for extracted text.
validation
PDF validation types for detecting specification violations. PDF validation types for detecting specification violations.
words
Word extraction from characters based on spatial proximity.