pdf-ocr
OCR integration for PDFluent — Tesseract and PaddleOCR backends.
This crate is part of the PDFluent commercial Rust PDF SDK.
Free for evaluation. Production use requires a valid license.
What it does
Extracts text from scanned/raster PDF pages using OCR. Two pluggable backends:
- Tesseract — broadly used, good general accuracy, mature
- PaddleOCR — better for non-Latin scripts (CJK, Arabic, etc.)
Status
Beta. Backend-dependent quality. Not yet broadly tested at scale.
Usage
Most users do not depend on this crate directly. Use the pdfluent facade with ocr-tesseract or ocr-paddle:
use *;
For low-level access, see https://pdfluent.com/docs.
Licensing
- Free for evaluation, development, and testing
- Production use requires a valid PDFluent commercial license
- Redistribution requires the OEM Redistribution add-on
See LICENSE for full terms, or visit https://pdfluent.com/terms.
Links
- Main crate: https://crates.io/crates/pdfluent
- Documentation: https://pdfluent.com/docs
- Trial: https://pdfluent.com/trial
- Pricing: https://pdfluent.com/pricing