Module parser

Expand description

Document parsing module.

This module provides parsers for different document formats. Each parser extracts RawNodes from documents that can then be organized into a [DocumentTree].

§Supported Formats

Markdown - Full support via MarkdownParser
PDF - Full support via PdfParser with TOC extraction
DOCX - Full support via DocxParser with heading detection
HTML - Full support via HtmlParser with heading hierarchy

§Example

use vectorless::parser::{DocumentParser, MarkdownParser, DocumentFormat};

// Create a parser
let parser = MarkdownParser::new();

// Parse content
let content = "# Title\n\nContent here.";
let result = parser.parse(content).await?;

println!("Extracted {} nodes", result.node_count());
for node in &result.nodes {
    println!("  - {} (level {})", node.title, node.level);
}

Re-exports§

pub use docx::DocxParser;
pub use html::HtmlConfig;
pub use html::HtmlParser;
pub use markdown::MarkdownConfig;
pub use markdown::MarkdownParser;
pub use pdf::PdfParser;

Modules§

docx: DOCX document parsing module.
html: HTML document parser.
markdown: Production-ready Markdown parser module.
pdf: PDF document parsing module.
toc: Table of Contents (TOC) processing module.

Structs§

DocumentMeta: Document metadata.
ParseResult: Result of parsing a document.
ParserRegistry: Registry for document parsers.
RawNode: A raw node extracted from a document.

Enums§

DocumentFormat: Supported document formats.

Traits§

DocumentParser: A parser for extracting content from documents.

Functions§

get_parser: Get a parser for the given format.
get_parser_for_file: Get a parser for a file based on its extension.
parse_content: Parse a document from content using the appropriate parser.
parse_file: Parse a document from a file.

Module parser

Module parser Copy item path

§Supported Formats

§Example

Re-exports§

Modules§

Structs§

Enums§

Traits§

Functions§

Module parser