Expand description
Document reader traits and registry for unified format ingestion.
Structs§
- Docx
Reader - Passthrough
Reader - Basic reader that proxies to the global
DocumentProcessorfor formats we already support via Extractous/lopdf. - PdfReader
- Primary PDF reader. Uses Pdfium when enabled, with a graceful fallback to the shared document processor.
- Pptx
Reader - Reader
Diagnostics - Metadata about a reader attempt used for observability and surfacing warnings.
- Reader
Hint - Hint provided to readers before probing/extraction.
- Reader
Output - Structured text and metadata extracted from a document, plus routing diagnostics.
- Reader
Registry - Registry of document readers used by the ingestion router.
- XlsReader
- Reader for legacy Excel 97-2003 (.xls) files using calamine.
- Xlsx
Reader
Enums§
- Document
Format - Soft classification of document formats used by the ingestion router.
Traits§
- Document
Reader - Trait implemented by document readers that can extract text from supported formats.