Expand description
Document loading abstraction for pluggable file format support.
The DocumentLoader trait decouples file format handling from the
RAG pipeline. Built-in loaders handle text (.txt, .md) and
subtitle (.srt, .vtt) formats. Third parties can implement
DocumentLoader for any format.
The LoaderRegistry dispatches loading to the appropriate loader
based on file extension, with support for sidecar subtitle files
adjacent to media files.
§Example
use aprender_rag::loader::LoaderRegistry;
use std::path::Path;
let registry = LoaderRegistry::new();
let extensions = registry.supported_extensions();
assert!(extensions.contains(&"txt"));
assert!(extensions.contains(&"srt"));Re-exports§
pub use transcription::TranscriptionLoader;
Modules§
- transcription
- Feature-gated transcription loader using whisper-apr for speech-to-text.
Structs§
- Image
Loader - Loads image files by extracting text via Tesseract OCR.
- Loader
Registry - Registry that dispatches file loading to the appropriate
DocumentLoader. - Subtitle
Loader - Loads subtitle files and produces Documents with timestamp metadata.
- Text
Loader - Loads plain text and Markdown files.
Traits§
- Document
Loader - Abstraction for loading files of any format into Documents.