Skip to main content

Module document_parser

Module document_parser 

Source
Expand description

Document parser types used by A3S Code’s context acquisition pipeline.

These types exist so agentic_search, agentic_parse, and session wiring can register a small set of document parsers when better context extraction is needed.

They are not intended to turn a3s-code-core into a general-purpose document processing framework.

§Architecture

  • Contracts: parser trait and registry live in crate::doc
  • Core defaults: PlainTextParser plus the internal composite parser factory live here
  • Built-in tools: agentic_search and agentic_parse consume this registry via ToolContext
  • Goal: recover better model context from non-plaintext project files

§Example

use a3s_code_core::document_parser::{DocumentParser, DocumentParserRegistry};
use std::path::Path;
use anyhow::Result;

struct PdfParser;

impl DocumentParser for PdfParser {
    fn name(&self) -> &str { "pdf" }
    fn supported_extensions(&self) -> &[&str] { &["pdf"] }
    fn parse(&self, path: &Path) -> Result<String> {
        todo!()
    }
}

let mut registry = DocumentParserRegistry::empty();
registry.register(std::sync::Arc::new(PdfParser));

Structs§

DocumentBlock
DocumentBlockLocation
DocumentConfidence
DocumentMetadata
DocumentParserRegistry
DocumentProvenance
ParsedDocument
PlainTextParser
Built-in parser for all common text, code, and config formats.

Enums§

DocumentBlockKind

Traits§

DocumentParser

Functions§

default_document_parser_registry
Build the default document parser registry using the default parser config.
document_parser_registry_with_config
Build the default document parser registry using an explicit parser config.
document_parser_registry_with_config_and_ocr
Build the default document parser registry using an explicit parser config and OCR provider.