turbovault-parser
Obsidian Flavored Markdown (OFM) parser for extracting structured data from Obsidian vault files.
This crate provides fast, production-ready parsing of Obsidian markdown files, extracting:
- YAML Frontmatter: Metadata in
---delimited blocks - Wikilinks:
[[Note]],[[folder/Note#Heading]],[[Note#^block]] - Embeds:
![[Image.png]],![[OtherNote]] - Tags:
#tag,#parent/child - Tasks:
- [ ] Todo,- [x] Done - Callouts:
> [!NOTE],> [!WARNING]+, etc. - Headings:
# H1through###### H6with anchor generation
Built on battle-tested libraries (pulldown-cmark, nom, regex) with comprehensive test coverage.
Architecture
The parser uses a multi-pass approach:
- Frontmatter Extraction: Regex-based YAML extraction from document start
- Link Parsing: Wikilinks and embeds via regex with position tracking
- Content Elements: Tags, tasks, callouts, headings extracted in parallel
All parsing is zero-allocation where possible, returning structured types from turbovault-core.
Parsers
Frontmatter Parser (parsers/frontmatter_parser.rs)
Extracts YAML metadata from the beginning of files:
use extract_frontmatter;
let content = r#"---
title: My Note
tags: [rust, parser]
---
Content here"#;
let = extract_frontmatter?;
assert_eq!;
assert_eq!;
Features:
- Handles multi-line YAML
- Strips frontmatter from content
- Returns
Noneif no frontmatter present - Resilient to malformed frontmatter (missing closing
---)
Wikilink Parser (parsers/wikilinks.rs)
Parses Obsidian's internal linking syntax:
use Parser;
use PathBuf;
let parser = new;
let content = "See [[Note#Heading]] and [[folder/SubNote]]";
// Links are extracted during parse_file()
// Supports:
// - [[Note]] - Basic link
// - [[folder/Note]] - Folder paths
// - [[Note#Heading]] - Section links
// - [[Note#^block-id]] - Block references
// - [[Note|Display Text]] - Custom display text (not yet implemented)
Features:
- Position tracking for each link
- Differentiates from embeds (excludes
![[...]]) - Captures full target path including heading/block anchors
Link Types
The parser classifies links into the following types:
use LinkType;
// WikiLink - basic wikilink: [[Note]]
// HeadingRef - cross-file heading: [[Note#Heading]] or file.md#section
// BlockRef - block reference: [[Note#^blockid]] or #^blockid
// Anchor - same-document anchor: [[#Heading]] or #section
// Embed - embedded content: ![[Note]]
// MarkdownLink - markdown link to file: [text](./file.md)
// ExternalLink - external URL: [text](https://...)
Detection examples:
[[Note]]→WikiLink[[Note#Heading]]→HeadingRef[[Note#^blockid]]→BlockRef[[#Heading]]→Anchor(same-document heading)[[#^blockid]]→BlockRef(same-document block)[text](#section)→Anchor[text](file.md#section)→HeadingRef[text](note.md#^block)→BlockRef
Embed Parser (parsers/embeds.rs)
Extracts embedded files and notes:
// Parses: ![[Image.png]], ![[attachments/diagram.svg]], ![[OtherNote]]
let content = "See diagram: ![[assets/architecture.png]]";
// Extracted as Link with type_ = LinkType::Embed
Features:
- Handles image embeds (
.png,.jpg,.svg, etc.) - Handles note embeds (transcluded content)
- Folder-aware paths
Tag Parser (parsers/tags.rs)
Extracts hashtags and nested tags:
let content = "This is #rust and #project/obsidian code";
// Extracts:
// - Tag { name: "rust", is_nested: false }
// - Tag { name: "project/obsidian", is_nested: true }
Features:
- Supports flat tags:
#rust - Supports nested tags:
#parent/child/grandchild - Alphanumeric, hyphens, underscores, and slashes
Task Parser (parsers/tasks.rs)
Parses task list items with completion status:
let content = r#"
- [ ] Write documentation
- [x] Implement parser
- [ ] Nested task
"#;
// Extracts TaskItem with:
// - content: "Write documentation"
// - is_completed: false
// - position: SourcePosition
// - due_date: None (TODO)
Features:
- Uncompleted tasks:
- [ ] - Completed tasks:
- [x] - Supports indentation (nested tasks)
- Line number tracking
Callout Parser (parsers/callouts.rs)
Parses Obsidian's callout/admonition syntax:
let content = r#"
> [!NOTE] Important info
> This is the callout content
> [!WARNING]- Expandable warning
> Hidden by default
"#;
// Supported types:
// NOTE, TIP, INFO, TODO, IMPORTANT, SUCCESS, QUESTION,
// WARNING, FAILURE, DANGER, BUG, EXAMPLE, QUOTE
Features:
- Type detection (maps to
CalloutTypeenum) - Optional title extraction
- Foldable callouts:
[!TYPE]+(expanded) or[!TYPE]-(collapsed) - Note: Multi-line content parsing is TODO
Heading Parser (parsers/headings.rs)
Extracts markdown headings with automatic anchor generation:
let content = "# Main Title\n## Sub Section";
// Extracts:
// - Heading { text: "Main Title", level: 1, anchor: Some("main-title") }
// - Heading { text: "Sub Section", level: 2, anchor: Some("sub-section") }
Features:
- Levels 1-6 (
#through######) - Automatic anchor generation (lowercase, spaces to hyphens)
- Position tracking for TOC generation
Usage Examples
Example 1: Basic File Parsing
use Parser;
use PathBuf;
let parser = new;
let content = r#"---
title: Project Plan
tags: [project, planning]
---
# Overview
This project uses #rust and #obsidian.
## Tasks
- [ ] Set up repository
- [x] Create parser
See [[Architecture]] for details.
![[diagram.png]]
"#;
let vault_file = parser.parse_file?;
// Access parsed elements:
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!;
assert_eq!; // 1 wikilink + 1 embed
Example 2: Link Extraction for Graph Building
use Parser;
use LinkType;
let parser = new;
let content = "See [[Note A]] and [[Note B#Section]]";
let file = parser.parse_file?;
for link in file.links
Example 3: Task Tracking
let parser = new;
let content = r#"
# Daily Tasks
- [x] Morning standup
- [ ] Code review
- [ ] Write tests
"#;
let file = parser.parse_file?;
let completed = file.tasks.iter.filter.count;
let total = file.tasks.len;
println!;
Obsidian-Specific Features
This parser handles Obsidian's unique markdown extensions:
- Wikilinks: Double-bracket syntax for internal linking
- Embeds:
!prefix for transclusion - Block References:
#^block-idsyntax for linking to specific blocks - Nested Tags:
/separated hierarchical tags - Callouts: Extended blockquote syntax with type and foldability
- Heading Anchors: Automatic URL-safe anchor generation
These features go beyond standard CommonMark and are essential for Obsidian vault operations.
Integration with turbovault-core
This crate depends on turbovault-core for:
- Data Models:
VaultFile,Link,Tag,TaskItem,Callout,Heading,Frontmatter - Error Types:
Result<T, Error>for consistent error handling - Configuration:
VaultConfigfor vault-specific parsing settings - Type Safety: Strong types prevent string-based API errors
Re-exported Types: For convenience, this crate re-exports commonly used types from turbovault-core:
// No need to depend on turbovault-core separately
use ;
use ;
The parser produces structured data that can be consumed by:
turbovault-vault: Vault management and indexingturbovault-server: MCP server tool implementations- Link graph builders and analyzers
Development
Running Tests
# All tests
# Specific parser tests
# With output
Adding a New Parser
- Create
src/parsers/my_parser.rs - Implement parsing function returning
Vec<MyType> - Add corresponding model to
turbovault-core/src/models.rs - Integrate into
Parser::parse_content()insrc/parsers.rs - Add comprehensive tests
Performance Characteristics
- Zero-copy where possible: Uses string slices, not copies
- Lazy evaluation: Only parses markdown files (
.mdextension) - Parallel-safe: All parsers are stateless and thread-safe
- Regex compilation: Uses
lazy_staticfor one-time regex compilation - Position tracking: Maintains byte offsets for fast lookups
Dependencies
pulldown-cmark: CommonMark parsing foundation (currently unused, reserved for future)frontmatter-gen: Frontmatter extraction (currently using custom regex)regex: Pattern matching for Obsidian syntaxnom: Parser combinators (available for complex parsing)serde,serde_yaml,serde_json: Frontmatter deserialization
Limitations and Future Work
Current limitations:
- Callout Content: Only parses first line, not continuation lines
- Display Text in Links:
[[Target|Display]]not yet parsed - Markdown Metadata: No extraction from CommonMark elements yet
- Dataview Queries: Not parsed (Obsidian plugin-specific)
- Mermaid/Code Blocks: Not extracted as structured data
Planned improvements:
- Multi-line callout content parsing
- Display text extraction for wikilinks
- Code block metadata (language, title)
- Better error recovery and partial parsing
- Performance benchmarking and optimization
License
See workspace license.
See Also
turbovault-core: Core data models and typesturbovault-vault: Vault management using parsed dataturbovault-server: MCP server tools built on parser output