Skip to main content

Module document_parser

Module document_parser 

Source
Expand description

Document Parser Extension Point

DocumentParser is a core extension point that allows users to extend agentic tools (agentic_search, agentic_parse, etc.) with custom file format support for binary and structured formats such as PDF, Excel, Word, etc.

§Architecture

  • Core: DocumentParser trait + DocumentParserRegistry live here
  • Default: PlainTextParser covers all common text/code formats
  • Built-in tools: agentic-search and agentic-parse use this registry via ToolContext
  • Custom: Users register additional parsers via SessionOptions

§Example

use a3s_code_core::document_parser::{DocumentParser, DocumentParserRegistry};
use std::path::Path;
use anyhow::Result;

struct PdfParser;

impl DocumentParser for PdfParser {
    fn name(&self) -> &str { "pdf" }
    fn supported_extensions(&self) -> &[&str] { &["pdf"] }
    fn parse(&self, path: &Path) -> Result<String> {
        // e.g. pdf_extract::extract_text(path)
        todo!()
    }
}

let mut registry = DocumentParserRegistry::new();
registry.register(std::sync::Arc::new(PdfParser));

Structs§

DocumentBlock
DocumentBlockLocation
DocumentParserRegistry
Registry that maps file extensions to DocumentParser implementations.
ParsedDocument
PlainTextParser
Built-in parser for all common text, code, and config formats.

Enums§

DocumentBlockKind

Traits§

DocumentParser
Extension point for custom file format parsing.