Crate turbovault_parser

Crate turbovault_parser 

Source
Expand description

§TurboVault Parser

Obsidian Flavored Markdown (OFM) parser built on pulldown-cmark.

This crate provides:

  • Fast markdown parsing via pulldown-cmark (CommonMark foundation)
  • Frontmatter extraction (YAML via pulldown-cmark metadata blocks)
  • Obsidian-specific syntax: wikilinks, embeds, callouts, tags
  • Code block awareness: patterns inside code blocks/inline code are excluded
  • Link extraction and resolution
  • Standalone parsing without vault context (for tools like treemd)

§Architecture

The parser uses a hybrid two-phase approach via unified ParseEngine:

§Phase 1: pulldown-cmark pass

  • Extracts CommonMark elements: headings, markdown links, tasks, frontmatter
  • Builds excluded ranges (code blocks, inline code, HTML) for Phase 2

§Phase 2: Regex pass (OFM extensions)

  • Parses Obsidian-specific syntax: wikilinks [[]], embeds ![[]], tags #tag, callouts
  • Skips excluded ranges to avoid matching inside code blocks

§Performance optimizations

  • Builds a LineIndex once for O(log n) position lookups
  • Uses fast pre-filters to skip regex when patterns aren’t present

§Quick Start

§With Vault Context

use turbovault_parser::Parser;
use std::path::PathBuf;

let content = r#"---
title: My Note
tags: [important, review]
---


[[WikiLink]] and [[Other Note#Heading]].

- [x] Completed task
- [ ] Pending task
"#;

let vault_path = PathBuf::from("/vault");
let parser = Parser::new(vault_path);

let path = PathBuf::from("my-note.md");
if let Ok(result) = parser.parse_file(&path, content) {
    // Access parsed components
    if let Some(frontmatter) = &result.frontmatter {
        println!("Frontmatter data: {:?}", frontmatter.data);
    }
    println!("Links: {}", result.links.len());
    println!("Tasks: {}", result.tasks.len());
}

§Standalone Parsing (No Vault Required)

use turbovault_parser::{ParsedContent, ParseOptions};

let content = "# Title\n\n[[WikiLink]] and [markdown](url) with #tag";

// Parse everything
let parsed = ParsedContent::parse(content);
assert_eq!(parsed.wikilinks.len(), 1);
assert_eq!(parsed.markdown_links.len(), 1);
assert_eq!(parsed.tags.len(), 1);

// Or parse selectively for better performance
let parsed = ParsedContent::parse_with_options(content, ParseOptions::links_only());

§Individual Parsers (Granular Control)

use turbovault_parser::{parse_wikilinks, parse_tags, parse_callouts};

let content = "[[Link]] with #tag and > [!NOTE] callout";

let wikilinks = parse_wikilinks(content);
let tags = parse_tags(content);
let callouts = parse_callouts(content);

§Supported OFM Features

  • Wikilinks: [[Note]]
  • Aliases: [[Note|Alias]]
  • Block references: [[Note#^blockid]]
  • Heading references: [[Note#Heading]]
  • Embeds: ![[Note]]
  • Markdown links: [text](url)

§Frontmatter

YAML frontmatter between --- delimiters is extracted and parsed.

§Elements

  • Headings: H1-H6 with level tracking
  • Tasks: Markdown checkboxes with completion status
  • Tags: Inline tags like #important
  • Callouts: Obsidian callout syntax > [!TYPE] with multi-line content

§Performance

The parser uses:

  • pulldown-cmark for CommonMark parsing + code block detection (O(n) linear time)
  • std::sync::LazyLock for compiled regex patterns (Rust 1.80+)
  • LineIndex for O(log n) position lookups via binary search
  • Fast pre-filters to skip regex when patterns aren’t present
  • Excluded range tracking to avoid parsing inside code blocks

Re-exports§

pub use parsers::Parser;
pub use parsers::frontmatter_parser::extract_frontmatter;

Modules§

parsers
OFM parser implementation using unified ParseEngine.
prelude
Convenient prelude for common imports.

Structs§

LineIndex
Pre-computed line starts for O(log n) line/column lookup.
ListItem
A list item with optional checkbox and nested content.
ParseOptions
Options for selective parsing.
ParsedContent
Parsed markdown content without vault context.
SourcePosition
Position in source text (line, column, byte offset)

Enums§

ContentBlock
A parsed content block in a markdown document.
InlineElement
An inline element within a block.
LinkType
Type of link in Obsidian content
TableAlignment
Table column alignment.

Functions§

parse_blocks
Parse markdown content into structured blocks.
parse_blocks_from_line
Parse markdown content into structured blocks, starting from a specific line.
parse_callouts
Parse callouts from content (header only, no multi-line content).
parse_callouts_full
Parse callouts with full multi-line content extraction.
parse_embeds
Parse embeds from content.
parse_headings
Parse headings from content.
parse_markdown_links
Parse markdown links from content.
parse_tags
Parse tags from content.
parse_tasks
Parse tasks from content.
parse_wikilinks
Parse wikilinks from content.
slugify
Generate URL-friendly slug from heading text.