Expand description
§TurboVault Parser
Obsidian Flavored Markdown (OFM) parser built on pulldown-cmark.
This crate provides:
- Fast markdown parsing via
pulldown-cmark(CommonMark foundation) - Frontmatter extraction (YAML via pulldown-cmark metadata blocks)
- Obsidian-specific syntax: wikilinks, embeds, callouts, tags
- Code block awareness: patterns inside code blocks/inline code are excluded
- Link extraction and resolution
- Standalone parsing without vault context (for tools like treemd)
§Architecture
The parser uses a hybrid two-phase approach via unified ParseEngine:
§Phase 1: pulldown-cmark pass
- Extracts CommonMark elements: headings, markdown links, tasks, frontmatter
- Builds excluded ranges (code blocks, inline code, HTML) for Phase 2
§Phase 2: Regex pass (OFM extensions)
- Parses Obsidian-specific syntax: wikilinks
[[]], embeds![[]], tags#tag, callouts - Skips excluded ranges to avoid matching inside code blocks
§Performance optimizations
- Builds a
LineIndexonce for O(log n) position lookups - Uses fast pre-filters to skip regex when patterns aren’t present
§Quick Start
§With Vault Context
use turbovault_parser::Parser;
use std::path::PathBuf;
let content = r#"---
title: My Note
tags: [important, review]
---
[[WikiLink]] and [[Other Note#Heading]].
- [x] Completed task
- [ ] Pending task
"#;
let vault_path = PathBuf::from("/vault");
let parser = Parser::new(vault_path);
let path = PathBuf::from("my-note.md");
if let Ok(result) = parser.parse_file(&path, content) {
// Access parsed components
if let Some(frontmatter) = &result.frontmatter {
println!("Frontmatter data: {:?}", frontmatter.data);
}
println!("Links: {}", result.links.len());
println!("Tasks: {}", result.tasks.len());
}§Standalone Parsing (No Vault Required)
use turbovault_parser::{ParsedContent, ParseOptions};
let content = "# Title\n\n[[WikiLink]] and [markdown](url) with #tag";
// Parse everything
let parsed = ParsedContent::parse(content);
assert_eq!(parsed.wikilinks.len(), 1);
assert_eq!(parsed.markdown_links.len(), 1);
assert_eq!(parsed.tags.len(), 1);
// Or parse selectively for better performance
let parsed = ParsedContent::parse_with_options(content, ParseOptions::links_only());§Individual Parsers (Granular Control)
use turbovault_parser::{parse_wikilinks, parse_tags, parse_callouts};
let content = "[[Link]] with #tag and > [!NOTE] callout";
let wikilinks = parse_wikilinks(content);
let tags = parse_tags(content);
let callouts = parse_callouts(content);§Supported OFM Features
§Links
- Wikilinks:
[[Note]] - Aliases:
[[Note|Alias]] - Block references:
[[Note#^blockid]] - Heading references:
[[Note#Heading]] - Embeds:
![[Note]] - Markdown links:
[text](url)
§Frontmatter
YAML frontmatter between --- delimiters is extracted and parsed.
§Elements
- Headings: H1-H6 with level tracking
- Tasks: Markdown checkboxes with completion status
- Tags: Inline tags like
#important - Callouts: Obsidian callout syntax
> [!TYPE]with multi-line content
§Performance
The parser uses:
pulldown-cmarkfor CommonMark parsing + code block detection (O(n) linear time)std::sync::LazyLockfor compiled regex patterns (Rust 1.80+)LineIndexfor O(log n) position lookups via binary search- Fast pre-filters to skip regex when patterns aren’t present
- Excluded range tracking to avoid parsing inside code blocks
Re-exports§
pub use parsers::Parser;pub use parsers::frontmatter_parser::extract_frontmatter;
Modules§
- parsers
- OFM parser implementation using unified ParseEngine.
- prelude
- Convenient prelude for common imports.
Structs§
- Line
Index - Pre-computed line starts for O(log n) line/column lookup.
- List
Item - A list item with optional checkbox and nested content.
- Parse
Options - Options for selective parsing.
- Parsed
Content - Parsed markdown content without vault context.
- Source
Position - Position in source text (line, column, byte offset)
Enums§
- Content
Block - A parsed content block in a markdown document.
- Inline
Element - An inline element within a block.
- Link
Type - Type of link in Obsidian content
- Table
Alignment - Table column alignment.
Functions§
- parse_
blocks - Parse markdown content into structured blocks.
- parse_
blocks_ from_ line - Parse markdown content into structured blocks, starting from a specific line.
- parse_
callouts - Parse callouts from content (header only, no multi-line content).
- parse_
callouts_ full - Parse callouts with full multi-line content extraction.
- parse_
embeds - Parse embeds from content.
- parse_
headings - Parse headings from content.
- parse_
markdown_ links - Parse markdown links from content.
- parse_
tags - Parse tags from content.
- parse_
tasks - Parse tasks from content.
- parse_
wikilinks - Parse wikilinks from content.
- slugify
- Generate URL-friendly slug from heading text.