Skip to main content

DocumentParser

Trait DocumentParser 

Source
pub trait DocumentParser: Send + Sync {
    // Required methods
    fn format(&self) -> DocumentFormat;
    fn parse<'life0, 'life1, 'async_trait>(
        &'life0 self,
        content: &'life1 str,
    ) -> Pin<Box<dyn Future<Output = Result<ParseResult>> + Send + 'async_trait>>
       where Self: 'async_trait,
             'life0: 'async_trait,
             'life1: 'async_trait;

    // Provided method
    fn parse_file<'life0, 'life1, 'async_trait>(
        &'life0 self,
        path: &'life1 Path,
    ) -> Pin<Box<dyn Future<Output = Result<ParseResult>> + Send + 'async_trait>>
       where Self: 'async_trait,
             'life0: 'async_trait,
             'life1: 'async_trait { ... }
}
Expand description

A parser for extracting content from documents.

Implementations parse different document formats and produce a sequence of raw nodes that can be organized into a tree.

§Example

use vectorless::parser::{DocumentParser, MarkdownParser};

let parser = MarkdownParser::new();
let content = "# Title\n\nContent here.";
let result = parser.parse(content).await?;
println!("Found {} nodes", result.node_count());

Required Methods§

Source

fn format(&self) -> DocumentFormat

Get the document format this parser handles.

Source

fn parse<'life0, 'life1, 'async_trait>( &'life0 self, content: &'life1 str, ) -> Pin<Box<dyn Future<Output = Result<ParseResult>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Parse content from a string.

§Arguments
  • content - The document content as a string
§Returns

A ParseResult containing extracted nodes and metadata.

Provided Methods§

Source

fn parse_file<'life0, 'life1, 'async_trait>( &'life0 self, path: &'life1 Path, ) -> Pin<Box<dyn Future<Output = Result<ParseResult>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait, 'life1: 'async_trait,

Parse content from a file.

Default implementation reads the file and calls parse.

§Arguments
  • path - Path to the file

Implementors§