pub trait DocumentParser: Send + Sync {
// Required methods
fn name(&self) -> &str;
fn supported_extensions(&self) -> &[&str];
fn parse(&self, path: &Path) -> Result<String>;
// Provided methods
fn parse_document(&self, path: &Path) -> Result<ParsedDocument> { ... }
fn can_parse(&self, path: &Path) -> bool { ... }
fn max_file_size(&self) -> u64 { ... }
}Expand description
Extension point for custom file format parsing.
Implement this trait to add support for formats that cannot be read as plain text (PDF, Excel, Word, images with OCR, etc.) without modifying any core tool logic.
Required Methods§
Sourcefn supported_extensions(&self) -> &[&str]
fn supported_extensions(&self) -> &[&str]
File extensions this parser handles (case-insensitive, no leading dot).
Example: &["pdf", "PDF"]
Provided Methods§
Sourcefn parse_document(&self, path: &Path) -> Result<ParsedDocument>
fn parse_document(&self, path: &Path) -> Result<ParsedDocument>
Extract a structured document from path.
The default implementation wraps DocumentParser::parse into a
single raw-text block so existing parsers remain source-compatible.
Sourcefn can_parse(&self, path: &Path) -> bool
fn can_parse(&self, path: &Path) -> bool
Override to control whether this parser will attempt a file before the
extension lookup. The default checks extension against
supported_extensions().
Sourcefn max_file_size(&self) -> u64
fn max_file_size(&self) -> u64
Maximum file size (bytes) this parser accepts. Files larger than this limit are silently skipped. Default: 10 MiB.