Markdown Scanner
Overview
markdown-scanner is a Rust-based command-line tool designed to scan Markdown files within a specified directory (e.g., an Obsidian vault) and extract metadata such as tags and backlinks. It stores this information in a SQLite database for efficient querying and organization. The tool is invoked via a Bash script (markdown-processor-all-rust.bash) that processes all .md files in a directory, making it suitable for integration with text editors like Neovim or workflows involving Markdown-based note-taking systems like Obsidian.
Features
- Tag Extraction: Extracts both YAML frontmatter tags and inline
#tagsfrom Markdown files, ignoring tags within code blocks. - Backlink Detection: Identifies
[[backlink]]references in Markdown files and links them to corresponding files in the database. - SQLite Database: Stores file metadata, folder structure, tags, and backlinks in a relational SQLite database for easy querying.
- File System Integration: Resolves file paths relative to a base directory and handles file system changes, ensuring accurate metadata.
- Error Handling: Robust error handling with custom error types and detailed logging for debugging.
- Editor Integration: Designed to be triggered on file save in editors like Neovim or used in batch processing for Markdown vaults.
Extra Features
- YouTube Title Extraction
Automatically fetches and stores the video title in the database (as JSON) when a YouTube link is detected. - File Creation Time Tracking
Stores the file creation timestamp when available.
Falls back to modification time on filesystems that don't provide reliable birth/creation time (termux, etc.).
Usage
The tool is typically executed via the provided Bash script or directly as a command-line utility.
Project that use markdown-scanner
Bash Script
The markdown-processor-all-rust.bash script processes all .md files in a specified directory (e.g., an Obsidian vault):
#!/bin/bash
DB="markdown_data.db"
| while ; do
done
- Environment Variable: Set
Obsidian_valt_main_pathto the root directory of your Markdown files. - Database: Specify the SQLite database file (defaults to
markdown_data.db). - Execution: Run the script to process all
.mdfiles in the specified directory.
Command-Line Usage
The Rust binary can be invoked directly:
<file_path>: Path to the Markdown file to process.<base_dir>: Base directory for resolving relative paths.-d <database_path>: Path to the SQLite database (default:markdown_data.db).
Example:
Integration with Neovim
Well I was using it in neovim for a long time. I think I will make it in one plugin when I rip the code form my enormous init.lua.
Database Schema
The SQLite database (markdown_data.db) contains the following tables:
- folders: Stores unique folder paths with their IDs.
id: Primary key.path: Relative folder path (unique).
- files: Stores file metadata.
id: Primary key.path: Relative file path (unique).file_name: Name of the file.folder_id: Referencesfolders(id).metadata: yaml data.
- tags: Stores unique tags.
id: Primary key.tag: Tag name (unique).
- file_tags: Maps files to tags.
file_id: Referencesfiles(id).tag_id: Referencestags(id).- Unique constraint on
(file_id, tag_id).
- backlinks: Stores backlink relationships.
id: Primary key.backlink: Backlink text (e.g.,Note Title).backlink_id: Referencesfiles(id)(nullable).file_id: Referencesfiles(id).- Unique constraint on
(backlink_id, file_id, backlink).
Installation
-
Prerequisites:
- Rust (stable) and Cargo for building the Rust binary.
- SQLite library for database operations (optioinal but recommended).
- Bash for running the script.
-
Build:
Or use [this if you are using linux](cheat sheet.md)
-
Set Up Script:
- Copy
markdown-processor-all-rust.bashto a vault. - Ensure it’s executable:
chmod +x markdown-processor-all-rust.bash. - Set the
Obsidian_valt_main_pathenvironment variable or hardcode the path in the script.
- Copy
How It Works
- Initialization:
- The tool initializes a SQLite database with the required schema if it doesn’t exist.
- It uses
clapfor command-line argument parsing andenv_loggerfor detailed logging.
- File Processing:
- Reads the specified Markdown file.
- Extracts YAML frontmatter tags and inline
#tags. - Identifies
[[backlink]]references, resolving them to existing files in the database or filesystem. - Cleans content by removing code blocks, URLs, and other irrelevant text before processing tags and backlinks.
- Database Operations:
- Inserts or updates folder and file metadata.
- Stores tags and associates them with files.
- Records backlinks, linking to other files when possible.
- Handles duplicate files by preferring matches in the same folder or the shortest path.
- Filesystem Traversal:
- Uses
jwalkfor efficient filesystem traversal when resolving backlinks. - Canonicalizes paths to ensure consistency across systems.
- Uses
Limitations
- Obsidian Vault: While designed for Obsidian, the tool assumes a flat or hierarchical Markdown file structure and may not handle all Obsidian-specific features.
- Backlink Resolution: Backlinks are resolved based on file names, which may lead to ambiguities if multiple files have the same name in different folders.
- No Real-Time Updates: The tool processes files on-demand (e.g., on save or via the script) and does not monitor the filesystem for changes (But easy to fix...).
Contributing
Contributions are welcome!
TODO / Future Improvements
- Make full yaml extraction in json. Like in
datopian/markdowndb - Add
--watchTo monitor files for changes and update the database accordingly
Why I Built This
I started using Obsidian for note-taking, but I ran into a major issue that drove me up the wall: it took 20–30 seconds to start Obsidian on my Android phone, and its search functionality was painfully slow. Searching for a specific file required remembering the full path or relying on a content-based search that didn’t prioritize file names. Using a terminal with nano on my Android was significantly faster, which pushed me to find a better solution.
I explored alternatives like Logseq, but they felt restrictive, forcing me to organize notes according to their rigid rules. Then I discovered Neovim’s powerful plugin system, which works seamlessly in a TTY environment, allowing me to edit files directly on my system without the overhead of GUI-based tools. This was a game-changer for my workflow.
My first attempt was a quick Bash script paired with a basic Lua configuration for Neovim. It worked, but it was clunky. I then tried rewriting the tool entirely in Lua, thinking I could leverage Neovim’s init.lua to manage dependencies. Big mistake. Termux, my Android terminal environment, didn’t support Lua libraries well, and the setup broke completely when a package link for Lua libraries changed unexpectedly. The frustration of dealing with broken dependencies pushed me to my limit.
Eventually, I turned to Rust to create a static binary that wouldn’t rely on fickle dependencies or slow plugins. I briefly experimented with epwalsh/obsidian.nvim, which was promising but took an excruciating 14 seconds to follow a backlink on my low-powered device—slower than my rg (ripgrep) searches! While obsidian.nvim is a great tool for more powerful systems, it wasn’t suitable for my "potato calculator." So, I built markdown-scanner to create a lightweight, fast, and reliable solution that integrates with Neovim, processes Markdown files efficiently, and stores metadata in a SQLite database for quick access.
License
This project is licensed under the GNU General Public License v3.0.