Expand description
Document parser types used by A3S Code’s context acquisition pipeline.
These types exist so agentic_search, agentic_parse, and session wiring
can register a small set of document parsers when better context
extraction is needed.
They are not intended to turn a3s-code-core into a general-purpose
document processing framework.
§Architecture
- Contracts: parser trait and registry live in
crate::doc - Core defaults:
PlainTextParserplus the internal composite parser factory live here - Built-in tools:
agentic_searchandagentic_parseconsume this registry viaToolContext - Goal: recover better model context from non-plaintext project files
§Example
use a3s_code_core::document_parser::{DocumentParser, DocumentParserRegistry};
use std::path::Path;
use anyhow::Result;
struct PdfParser;
impl DocumentParser for PdfParser {
fn name(&self) -> &str { "pdf" }
fn supported_extensions(&self) -> &[&str] { &["pdf"] }
fn parse(&self, path: &Path) -> Result<String> {
todo!()
}
}
let mut registry = DocumentParserRegistry::empty();
registry.register(std::sync::Arc::new(PdfParser));Structs§
- Document
Block - Document
Block Location - Document
Confidence - Document
Metadata - Document
Parser Registry - Document
Provenance - Parsed
Document - Plain
Text Parser - Built-in parser for all common text, code, and config formats.
Enums§
Traits§
Functions§
- default_
document_ parser_ registry - Build the default document parser registry using the default parser config.
- document_
parser_ registry_ with_ config - Build the default document parser registry using an explicit parser config.
- document_
parser_ registry_ with_ config_ and_ ocr - Build the default document parser registry using an explicit parser config and OCR provider.