MarkdownParser

Struct MarkdownParser 

Source
pub struct MarkdownParser { /* private fields */ }
Expand description

A tree-sitter based markdown parser.

Provides structured parsing of markdown documents with heading hierarchy extraction, content block identification, and diagnostic reporting. The parser is designed to be resilient to malformed input while providing detailed structural information.

§Parsing Strategy

The parser uses tree-sitter’s markdown grammar to:

  1. Build a complete syntax tree of the document
  2. Walk the tree to identify heading nodes and their levels
  3. Extract content blocks between headings
  4. Build hierarchical table of contents structure
  5. Generate diagnostics for quality issues

§Reusability

Parser instances can be reused for multiple documents, but are not thread-safe. The internal tree-sitter parser maintains mutable state across parse operations.

§Memory Management

The parser automatically manages memory for syntax trees and intermediate structures. Large documents may temporarily use significant memory during parsing, but this is released after the parse() method returns.

Implementations§

Source§

impl MarkdownParser

Source

pub fn new() -> Result<Self>

Create a new markdown parser instance.

Initializes the tree-sitter parser with the markdown grammar. This operation may fail if the tree-sitter language cannot be loaded properly.

§Returns

Returns a new parser instance ready for use.

§Errors

Returns an error if:

  • The tree-sitter markdown language cannot be loaded
  • The parser cannot be initialized with the markdown grammar
  • System resources are insufficient for parser creation
§Examples
use blz_core::{MarkdownParser, Result};

// Create a new parser
let mut parser = MarkdownParser::new()?;

// Parser is now ready to parse markdown content
let result = parser.parse("# Hello World\n\nContent here.")?;
assert!(!result.heading_blocks.is_empty());
§Resource Usage

Creating a parser allocates approximately 1-2MB of memory for the grammar and internal structures. This overhead is amortized across multiple parse operations.

Source

pub fn parse(&mut self, text: &str) -> Result<ParseResult>

Parse markdown text into structured components.

Performs complete analysis of the markdown document, extracting heading hierarchy, content blocks, table of contents, and generating diagnostics for any issues found.

§Arguments
  • text - The markdown content to parse (UTF-8 string)
§Returns

Returns a ParseResult containing:

  • Structured heading blocks with content and line ranges
  • Hierarchical table of contents
  • Diagnostic messages for any issues found
  • Line count and other metadata
§Errors

Returns an error if:

  • The text cannot be parsed by tree-sitter (very rare)
  • Memory is exhausted during parsing of extremely large documents
  • Internal parsing structures cannot be built

Note: Most malformed markdown will not cause errors but will generate diagnostics.

§Examples
use blz_core::{MarkdownParser, Result};

let mut parser = MarkdownParser::new()?;

// Parse simple markdown
let result = parser.parse(r#"

This is an introduction section.

# Getting Started

Here's how to get started:

1. First step
2. Second step

## Prerequisites

You'll need these tools.
"#)?;

// Check the results
// The parser creates one block per heading with content until the next heading
assert!(result.heading_blocks.len() >= 2); // At least Introduction and Getting Started
assert!(!result.toc.is_empty());
// Line count represents total lines in the document
assert!(result.line_count > 0);

// Look for any parsing issues
for diagnostic in &result.diagnostics {
    println!("{:?}: {}", diagnostic.severity, diagnostic.message);
}
§Performance Guidelines
  • Documents up to 1MB: Parse in under 50ms
  • Documents up to 10MB: Parse in under 500ms
  • Very large documents: Consider streaming or chunking for better UX
§Memory Usage

Memory usage during parsing is approximately:

  • Small documents (< 100KB): ~2x document size
  • Large documents (> 1MB): ~1.5x document size
  • Peak usage occurs during tree traversal and structure building

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> Downcast for T
where T: Any,

Source§

fn into_any(self: Box<T>) -> Box<dyn Any>

Convert Box<dyn Trait> (where Trait: Downcast) to Box<dyn Any>. Box<dyn Any> can then be further downcast into Box<ConcreteType> where ConcreteType implements Trait.
Source§

fn into_any_rc(self: Rc<T>) -> Rc<dyn Any>

Convert Rc<Trait> (where Trait: Downcast) to Rc<Any>. Rc<Any> can then be further downcast into Rc<ConcreteType> where ConcreteType implements Trait.
Source§

fn as_any(&self) -> &(dyn Any + 'static)

Convert &Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &Any’s vtable from &Trait’s.
Source§

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

Convert &mut Trait (where Trait: Downcast) to &Any. This is needed since Rust cannot generate &mut Any’s vtable from &mut Trait’s.
Source§

impl<T> DowncastSync for T
where T: Any + Send + Sync,

Source§

fn into_any_arc(self: Arc<T>) -> Arc<dyn Any + Sync + Send>

Convert Arc<Trait> (where Trait: Downcast) to Arc<Any>. Arc<Any> can then be further downcast into Arc<ConcreteType> where ConcreteType implements Trait.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> Fruit for T
where T: Send + Downcast,