Skip to main content

Crate pdf_engine

Crate pdf_engine 

Source
Expand description

Unified PDF rendering engine.

pdf-engine is the main public-facing API for reading and rendering PDF documents. It wraps the lower-level pdf-syntax / pdf-interpret / pdf-render stack and exposes a single PdfDocument handle for all common operations: page rendering, text extraction, thumbnails, metadata, bookmarks, and full-text search.

§Quick Start

use std::sync::Arc;
use pdf_engine::{PdfDocument, RenderOptions};

// Load from bytes (accepts Arc<Vec<u8>>, Vec<u8>, or any Into<PdfData>).
let data = Arc::new(std::fs::read("invoice.pdf").unwrap());
let doc = PdfDocument::open(data).unwrap();

println!("{} pages — {:?}", doc.page_count(), doc.info().title);

// Render page 0 at 150 DPI → raw RGBA pixel data.
let opts = RenderOptions { dpi: 150.0, ..Default::default() };
let rendered = doc.render_page(0, &opts).unwrap();
println!("{}×{} px", rendered.width, rendered.height);

// Plain-text extraction.
let text = doc.extract_text(0).unwrap();
println!("{text}");

// Structured text with per-span positions.
for block in doc.extract_text_blocks(0).unwrap() {
    for span in &block.spans {
        println!("  [{:.0}, {:.0}] {}", span.x, span.y, span.text);
    }
}

// Full-text search — returns 0-based page indices.
let hits = doc.search_text("total");
println!("'total' found on {} page(s)", hits.len());

§Key Types

TypeDescription
BatchConfig / BatchResultWorker-pool processing for many PDFs
PdfDocumentMain document handle
RenderConfig / RenderOptionsDPI, color mode, background colour, optional forced width/height
RenderedPageRGBA or CMYK pixel data (row-major, 4 bytes per pixel)
PageGeometryMediaBox, CropBox, TrimBox, BleedBox, rotation
PageBoxA rectangle in PDF user-space points
DocumentInfoTitle, author, subject, creator, producer
TextBlock / TextSpanStructured text with position and font size
BookmarkItemOutline node — title, target page, nested children
ThumbnailOptionsMax-dimension constraint for thumbnail rendering

Re-exports§

pub use batch::process_batch;
pub use batch::BatchConfig;
pub use batch::BatchResult;
pub use batch::ErrorStrategy;
pub use batch::PdfBatch;
pub use color::preserve_device_cmyk;
pub use document::BookmarkItem;
pub use document::DocumentInfo;
pub use document::PdfDocument;
pub use error::EngineError;
pub use error::Result;
pub use geometry::PageBox;
pub use geometry::PageGeometry;
pub use geometry::PageRotation;
pub use limits::LimitError;
pub use limits::ProcessingLimits;
pub use ocr::OcrBackend;
pub use ocr::OcrError;
pub use ocr::OcrResult;
pub use ocr::OcrWord;
pub use render::ColorMode;
pub use render::PixelFormat;
pub use render::RenderConfig;
pub use render::RenderOptions;
pub use render::RenderedPage;
pub use text::TextBlock;
pub use text::TextSpan;
pub use thumbnail::ThumbnailOptions;
pub use ocr::best_available_backend;

Modules§

api
The ideal top-level API facade for the PDFluent SDK.
api_error
batch
Batch helpers for processing many PDFs with a bounded worker pool.
color
Color helpers for render output conversion.
document
Unified document facade — multi-page rendering, text extraction, metadata, bookmarks, and thumbnails.
error
Error types for the rendering engine.
geometry
Page geometry: boxes (MediaBox, CropBox, TrimBox, BleedBox, ArtBox), rotation, and DPI-based pixel conversions.
limits
Resource limits for PDF processing.
ocr
OCR backend trait and implementations.
render
Page rendering with z-order compositing.
text
Text extraction via a custom Device implementation.
thumbnail
Thumbnail generation options.