pdf-ocr 1.0.0-beta.5

OCR integration for scanned PDFs with pluggable engine support
Documentation

pdf-ocr

OCR integration for PDFluent — Tesseract and PaddleOCR backends.

This crate is part of the PDFluent commercial Rust PDF SDK.

Free for evaluation. Production use requires a valid license.

What it does

Extracts text from scanned/raster PDF pages using OCR. Two pluggable backends:

  • Tesseract — broadly used, good general accuracy, mature
  • PaddleOCR — better for non-Latin scripts (CJK, Arabic, etc.)

Status

Beta. Backend-dependent quality. Not yet broadly tested at scale.

Usage

Most users do not depend on this crate directly. Use the pdfluent facade with ocr-tesseract or ocr-paddle:

use pdfluent::prelude::*;

For low-level access, see https://pdfluent.com/docs.

Licensing

  • Free for evaluation, development, and testing
  • Production use requires a valid PDFluent commercial license
  • Redistribution requires the OEM Redistribution add-on

See LICENSE for full terms, or visit https://pdfluent.com/terms.

Links