Module parser

Expand description

Tree-sitter based markdown parser Markdown parsing using tree-sitter for structured content analysis.

This module provides robust markdown parsing capabilities using tree-sitter, which enables precise syntax analysis and structured extraction of headings, content blocks, and table of contents information.

§Features

Hierarchical Structure: Builds nested heading structures matching document organization
Error Resilience: Continues parsing even with malformed markdown syntax
Diagnostics: Reports issues found during parsing for quality assurance
Performance: Efficiently handles large documents (< 150ms per MB)
Unicode Support: Full Unicode support including complex scripts and emoji

§Architecture

The parser uses tree-sitter for tokenization and syntax analysis, then builds structured representations:

Tokenization: tree-sitter parses markdown into a syntax tree
Structure Extraction: Traverse tree to identify headings and content blocks
Hierarchy Building: Construct nested TOC and heading block structures
Validation: Generate diagnostics for quality issues

§Examples

§Basic parsing:

use blz_core::{MarkdownParser, Result};

let mut parser = MarkdownParser::new()?;
let result = parser.parse(r#"

Welcome to the documentation.

# Installation

Run the following command:
cargo install blz

# Usage

Basic usage example.
"#)?;

println!("Found {} heading blocks", result.heading_blocks.len());
println!("TOC has {} entries", result.toc.len());
println!("Total lines: {}", result.line_count);

for diagnostic in &result.diagnostics {
    match diagnostic.severity {
        blz_core::DiagnosticSeverity::Warn => {
            println!("Warning: {}", diagnostic.message);
        }
        blz_core::DiagnosticSeverity::Error => {
            println!("Error: {}", diagnostic.message);
        }
        blz_core::DiagnosticSeverity::Info => {
            println!("Info: {}", diagnostic.message);
        }
    }
}

§Working with structured results:

use blz_core::{MarkdownParser, Result};

let mut parser = MarkdownParser::new()?;
let result = parser.parse("# Main\n\nMain content\n\n## Sub\n\nSub content here.")?;

// Examine heading blocks
for block in &result.heading_blocks {
    println!("Section: {} (lines {}-{})",
        block.path.join(" > "),
        block.start_line,
        block.end_line);
}

// Examine table of contents
fn print_toc(entries: &[blz_core::TocEntry], indent: usize) {
    for entry in entries {
        println!("{}{} ({})",
            "  ".repeat(indent),
            entry.heading_path.last().unwrap_or(&"Unknown".to_string()),
            entry.lines);
        print_toc(&entry.children, indent + 1);
    }
}
print_toc(&result.toc, 0);

§Performance Characteristics

Parse Time: < 150ms per MB of markdown content
Memory Usage: ~2x source document size during parsing
Large Documents: Efficiently handles documents up to 100MB
Complex Structure: Handles deeply nested headings (tested up to 50 levels)

§Error Handling

The parser is designed to be resilient to malformed input:

Syntax Errors: tree-sitter handles most malformed markdown gracefully
Missing Headings: Creates a default “Document” block for content without structure
Encoding Issues: Handles various text encodings and invalid UTF-8 sequences
Memory Limits: Prevents excessive memory usage on pathological inputs

§Thread Safety

MarkdownParser is not thread-safe due to internal mutable state in tree-sitter. Create separate parser instances for concurrent parsing:

use blz_core::{MarkdownParser, Result};
use std::thread;

fn parse_concurrently(documents: Vec<String>) -> Vec<Result<blz_core::ParseResult>> {
    documents
        .into_iter()
        .map(|doc| {
            thread::spawn(move || {
                let mut parser = MarkdownParser::new()?;
                parser.parse(&doc)
            })
        })
        .collect::<Vec<_>>()
        .into_iter()
        .map(|handle| handle.join().unwrap())
        .collect()
}

Structs§

MarkdownParser: A tree-sitter based markdown parser.
ParseResult: The result of parsing a markdown document.

Module parser

Module parser Copy item path

§Features

§Architecture

§Examples

§Basic parsing:

§Working with structured results:

§Performance Characteristics

§Error Handling

§Thread Safety

Structs§

Module parser