Expand description
Backend-independent data types and algorithms for pdfplumber-rs.
This crate provides the foundational types (BBox, Char, Word,
Line, Rect, Table, etc.) and algorithms (text grouping, table
detection) used by pdfplumber-rs. It has no required external dependencies —
all functionality is pure Rust.
§Modules
geometry— Geometric primitives:Point,BBox,Ctm,Orientationtext— Character data:Char,TextDirection, CJK detectionwords— Word extraction:Word,WordExtractor,WordOptionslayout— Text layout:TextLine,TextBlock,TextOptionsshapes— Shapes from painted paths:Line,Rect,Curveedges— Edge derivation for table detection:Edge,EdgeSourcetable— Table detection:Table,TableFinder,TableSettingsimages— Image extraction:Image,ImageMetadatapainting— Graphics state:Color,GraphicsState,PaintedPathpath— Path construction:Path,PathBuilder,PathSegmentencoding— Font encoding:FontEncoding,EncodingResolvererror— Errors and warnings:PdfError,ExtractWarning,ExtractOptions
Re-exports§
pub use edges::Edge;pub use edges::EdgeSource;pub use edges::derive_edges;pub use edges::edge_from_curve;pub use edges::edge_from_line;pub use edges::edges_from_rect;pub use encoding::EncodingResolver;pub use encoding::FontEncoding;pub use encoding::StandardEncoding;pub use error::ExtractOptions;pub use error::ExtractResult;pub use error::ExtractWarning;pub use error::PdfError;pub use geometry::BBox;pub use geometry::Ctm;pub use geometry::Orientation;pub use geometry::Point;pub use images::Image;pub use images::ImageMetadata;pub use images::image_from_ctm;pub use layout::TextBlock;pub use layout::TextLine;pub use layout::TextOptions;pub use layout::blocks_to_text;pub use layout::cluster_lines_into_blocks;pub use layout::cluster_words_into_lines;pub use layout::sort_blocks_reading_order;pub use layout::split_lines_at_columns;pub use layout::words_to_text;pub use painting::Color;pub use painting::DashPattern;pub use painting::ExtGState;pub use painting::FillRule;pub use painting::GraphicsState;pub use painting::PaintedPath;pub use path::Path;pub use path::PathBuilder;pub use path::PathSegment;pub use shapes::Curve;pub use shapes::Line;pub use shapes::LineOrientation;pub use shapes::Rect;pub use shapes::extract_shapes;pub use table::Cell;pub use table::ExplicitLines;pub use table::Intersection;pub use table::Strategy;pub use table::Table;pub use table::TableFinder;pub use table::TableSettings;pub use table::cells_to_tables;pub use table::edges_to_intersections;pub use table::explicit_lines_to_edges;pub use table::extract_text_for_cells;pub use table::intersections_to_cells;pub use table::join_edge_group;pub use table::snap_edges;pub use table::words_to_edges_stream;pub use text::Char;pub use text::TextDirection;pub use text::is_cjk;pub use text::is_cjk_text;pub use words::Word;pub use words::WordExtractor;pub use words::WordOptions;
Modules§
- edges
- Edge derivation from geometric primitives for table detection. Edge derivation from geometric primitives.
- encoding
- Font encoding mapping (Standard, Windows, Mac, Custom). Standard PDF text encodings and encoding resolution.
- error
- Error and warning types for PDF processing. Error and warning types for pdfplumber-rs.
- geometry
- Geometric primitives: Point, BBox, CTM, Orientation.
- images
- Image extraction and metadata. Image extraction from XObject Do operator.
- layout
- Text layout: words → lines → blocks, reading order, text output.
- painting
- Graphics state, colors, dash patterns, and painted paths. Path painting operators, graphics state, and ExtGState types.
- path
- PDF path construction (MoveTo, LineTo, CurveTo, ClosePath).
- shapes
- Shape extraction: Lines, Rects, Curves from painted paths. Line and Rect extraction from painted paths.
- table
- Table detection: lattice, stream, and explicit strategies. Table detection types and pipeline.
- text
- Character data types and CJK detection.
- words
- Word extraction from characters based on spatial proximity.