Skip to main content

Module reader

Module reader 

Source
Expand description

Document reader traits and registry for unified format ingestion.

Structs§

DocxReader
PassthroughReader
Basic reader that proxies to the global DocumentProcessor for formats we already support via Extractous/lopdf.
PdfReader
Primary PDF reader. Uses Pdfium when enabled, with a graceful fallback to the shared document processor.
PptxReader
ReaderDiagnostics
Metadata about a reader attempt used for observability and surfacing warnings.
ReaderHint
Hint provided to readers before probing/extraction.
ReaderOutput
Structured text and metadata extracted from a document, plus routing diagnostics.
ReaderRegistry
Registry of document readers used by the ingestion router.
XlsReader
Reader for legacy Excel 97-2003 (.xls) files using calamine.
XlsxReader

Enums§

DocumentFormat
Soft classification of document formats used by the ingestion router.

Traits§

DocumentReader
Trait implemented by document readers that can extract text from supported formats.