pdf-ocr 1.0.0-beta.5

OCR integration for scanned PDFs with pluggable engine support

Coverage
100%
40 out of 40 items documented0 out of 12 items with examples
Size
Source code size: 177.7 kB This is the summed size of all the files inside the crates.io package for this release.
Documentation size: 1.27 MB This is the summed size of all files generated by rustdoc for all configured targets
Ø build duration
this release: 35s Average build duration of successful builds.
all releases: 22s Average build duration of successful builds in releases after 2024-10-23.
Links
Homepage
Documentation
Repository
crates.io
Dependencies
Versions
Owners

pdf-ocr

OCR integration for PDFluent — Tesseract and PaddleOCR backends.

This crate is part of the PDFluent commercial Rust PDF SDK.

Free for evaluation. Production use requires a valid license.

What it does

Extracts text from scanned/raster PDF pages using OCR. Two pluggable backends:

Tesseract — broadly used, good general accuracy, mature
PaddleOCR — better for non-Latin scripts (CJK, Arabic, etc.)

Status

Beta. Backend-dependent quality. Not yet broadly tested at scale.

Usage

Most users do not depend on this crate directly. Use the pdfluent facade with ocr-tesseract or ocr-paddle:

use pdfluent::prelude::*;

For low-level access, see https://pdfluent.com/docs.

Licensing

Free for evaluation, development, and testing
Production use requires a valid PDFluent commercial license
Redistribution requires the OEM Redistribution add-on

See LICENSE for full terms, or visit https://pdfluent.com/terms.

Links

Main crate: https://crates.io/crates/pdfluent
Documentation: https://pdfluent.com/docs
Trial: https://pdfluent.com/trial
Pricing: https://pdfluent.com/pricing