Expand description
Unified PDF rendering engine.
pdf-engine is the main public-facing API for reading and rendering PDF
documents. It wraps the lower-level pdf-syntax / pdf-interpret /
pdf-render stack and exposes a single PdfDocument handle for all
common operations: page rendering, text extraction, thumbnails, metadata,
bookmarks, and full-text search.
§Quick Start
use std::sync::Arc;
use pdf_engine::{PdfDocument, RenderOptions};
// Load from bytes (accepts Arc<Vec<u8>>, Vec<u8>, or any Into<PdfData>).
let data = Arc::new(std::fs::read("invoice.pdf").unwrap());
let doc = PdfDocument::open(data).unwrap();
println!("{} pages — {:?}", doc.page_count(), doc.info().title);
// Render page 0 at 150 DPI → raw RGBA pixel data.
let opts = RenderOptions { dpi: 150.0, ..Default::default() };
let rendered = doc.render_page(0, &opts).unwrap();
println!("{}×{} px", rendered.width, rendered.height);
// Plain-text extraction.
let text = doc.extract_text(0).unwrap();
println!("{text}");
// Structured text with per-span positions.
for block in doc.extract_text_blocks(0).unwrap() {
for span in &block.spans {
println!(" [{:.0}, {:.0}] {}", span.x, span.y, span.text);
}
}
// Full-text search — returns 0-based page indices.
let hits = doc.search_text("total");
println!("'total' found on {} page(s)", hits.len());§Key Types
| Type | Description |
|---|---|
BatchConfig / BatchResult | Worker-pool processing for many PDFs |
PdfDocument | Main document handle |
RenderConfig / RenderOptions | DPI, color mode, background colour, optional forced width/height |
RenderedPage | RGBA or CMYK pixel data (row-major, 4 bytes per pixel) |
PageGeometry | MediaBox, CropBox, TrimBox, BleedBox, rotation |
PageBox | A rectangle in PDF user-space points |
DocumentInfo | Title, author, subject, creator, producer |
TextBlock / TextSpan | Structured text with position and font size |
BookmarkItem | Outline node — title, target page, nested children |
ThumbnailOptions | Max-dimension constraint for thumbnail rendering |
Re-exports§
pub use batch::process_batch;pub use batch::BatchConfig;pub use batch::BatchResult;pub use batch::ErrorStrategy;pub use batch::PdfBatch;pub use color::preserve_device_cmyk;pub use document::BookmarkItem;pub use document::DocumentInfo;pub use document::PdfDocument;pub use error::EngineError;pub use error::Result;pub use geometry::PageBox;pub use geometry::PageGeometry;pub use geometry::PageRotation;pub use limits::LimitError;pub use limits::ProcessingLimits;pub use ocr::OcrBackend;pub use ocr::OcrError;pub use ocr::OcrResult;pub use ocr::OcrWord;pub use render::ColorMode;pub use render::PixelFormat;pub use render::RenderConfig;pub use render::RenderOptions;pub use render::RenderedPage;pub use text::TextBlock;pub use text::TextSpan;pub use thumbnail::ThumbnailOptions;pub use ocr::best_available_backend;
Modules§
- api
- The ideal top-level API facade for the PDFluent SDK.
- api_
error - batch
- Batch helpers for processing many PDFs with a bounded worker pool.
- color
- Color helpers for render output conversion.
- document
- Unified document facade — multi-page rendering, text extraction, metadata, bookmarks, and thumbnails.
- error
- Error types for the rendering engine.
- geometry
- Page geometry: boxes (MediaBox, CropBox, TrimBox, BleedBox, ArtBox), rotation, and DPI-based pixel conversions.
- limits
- Resource limits for PDF processing.
- ocr
- OCR backend trait and implementations.
- render
- Page rendering with z-order compositing.
- text
- Text extraction via a custom Device implementation.
- thumbnail
- Thumbnail generation options.