Skip to main content

Module reader

Module reader 

Source
Expand description

Document reader traits and registry for unified format ingestion.

Re-exports§

pub use xlsx_chunker::XlsxChunkingOptions;
pub use xlsx_table_detect::DetectedTable;

Modules§

xlsx_chunker
Row-aligned semantic chunking for XLSX spreadsheets.
xlsx_ooxml
OOXML metadata parser for XLSX files.
xlsx_table_detect
Table structure detection for XLSX sheets.

Structs§

DocxReader
PassthroughReader
Basic reader that proxies to the global DocumentProcessor for formats we already support via Extractous/lopdf.
PdfReader
Primary PDF reader. Uses Pdfium when enabled, with a graceful fallback to the shared document processor.
PptxReader
ReaderDiagnostics
Metadata about a reader attempt used for observability and surfacing warnings.
ReaderHint
Hint provided to readers before probing/extraction.
ReaderOutput
Structured text and metadata extracted from a document, plus routing diagnostics.
ReaderRegistry
Registry of document readers used by the ingestion router.
XlsReader
Reader for legacy Excel 97-2003 (.xls) files using calamine.
XlsxReader
XlsxStructuredDiagnostics
Diagnostics from structured extraction.
XlsxStructuredResult
Result of the structured XLSX extraction pipeline.

Enums§

DocumentFormat
Soft classification of document formats used by the ingestion router.

Traits§

DocumentReader
Trait implemented by document readers that can extract text from supported formats.