markdown-scanner 1.1.0

markdown-scanner is a Rust-based command-line tool designed to scan Markdown files within a specified directory in sqlite db.
markdown-scanner-1.1.0 is not a library.

Markdown Scanner

Overview

markdown-scanner is a Rust-based command-line tool designed to scan Markdown files within a specified directory (e.g., an Obsidian vault) and extract metadata such as tags and backlinks. It stores this information in a SQLite database for efficient querying and organization. The tool is invoked via a Bash script (markdown-processor-all-rust.bash) that processes all .md files in a directory, making it suitable for integration with text editors like Neovim or workflows involving Markdown-based note-taking systems like Obsidian.

Features

  • Tag Extraction: Extracts both YAML frontmatter tags and inline #tags from Markdown files, ignoring tags within code blocks.
  • Backlink Detection: Identifies [[backlink]] references in Markdown files and links them to corresponding files in the database.
  • SQLite Database: Stores file metadata, folder structure, tags, and backlinks in a relational SQLite database for easy querying.
  • File System Integration: Resolves file paths relative to a base directory and handles file system changes, ensuring accurate metadata.
  • Error Handling: Robust error handling with custom error types and detailed logging for debugging.
  • Editor Integration: Designed to be triggered on file save in editors like Neovim or used in batch processing for Markdown vaults.

Extra Features

  • YouTube Title Extraction
    Automatically fetches and stores the video title in the database (as JSON) when a YouTube link is detected.
  • File Creation Time Tracking
    Stores the file creation timestamp when available.
    Falls back to modification time on filesystems that don't provide reliable birth/creation time (termux, etc.).

Usage

The tool is typically executed via the provided Bash script or directly as a command-line utility.

Project that use markdown-scanner

Bash Script

The markdown-processor-all-rust.bash script processes all .md files in a specified directory (e.g., an Obsidian vault):

#!/bin/bash
DB="markdown_data.db"
find "$Obsidian_valt_main_path" -name "*.md" | while read -r file; do
    echo "Processing file: $file"
    markdown-scanner "$file" "$Obsidian_valt_main_path" -d "$DB"
    echo "Data inserted for file: $file"
done
  • Environment Variable: Set Obsidian_valt_main_path to the root directory of your Markdown files.
  • Database: Specify the SQLite database file (defaults to markdown_data.db).
  • Execution: Run the script to process all .md files in the specified directory.

Command-Line Usage

The Rust binary can be invoked directly:

markdown-scanner <file_path> <base_dir> -d <database_path>
  • <file_path>: Path to the Markdown file to process.
  • <base_dir>: Base directory for resolving relative paths.
  • -d <database_path>: Path to the SQLite database (default: markdown_data.db).

Example:

markdown-scanner /path/to/note.md /path/to/vault/dir/ -d markdown_data.db

Integration with Neovim

Well I was using it in neovim for a long time. I think I will make it in one plugin when I rip the code form my enormous init.lua.

Database Schema

The SQLite database (markdown_data.db) contains the following tables:

  • folders: Stores unique folder paths with their IDs.
    • id: Primary key.
    • path: Relative folder path (unique).
  • files: Stores file metadata.
    • id: Primary key.
    • path: Relative file path (unique).
    • file_name: Name of the file.
    • folder_id: References folders(id).
    • metadata : yaml data.
  • tags: Stores unique tags.
    • id: Primary key.
    • tag: Tag name (unique).
  • file_tags: Maps files to tags.
    • file_id: References files(id).
    • tag_id: References tags(id).
    • Unique constraint on (file_id, tag_id).
  • backlinks: Stores backlink relationships.
    • id: Primary key.
    • backlink: Backlink text (e.g., Note Title).
    • backlink_id: References files(id) (nullable).
    • file_id: References files(id).
    • Unique constraint on (backlink_id, file_id, backlink).

Installation

  1. Prerequisites:

    • Rust (stable) and Cargo for building the Rust binary.
    • SQLite library for database operations (optioinal but recommended).
    • Bash for running the script.
  2. Build:

    cargo build --release
    

    Or use [this if you are using linux](cheat sheet.md)

  3. Set Up Script:

    • Copy markdown-processor-all-rust.bash to a vault.
    • Ensure it’s executable: chmod +x markdown-processor-all-rust.bash.
    • Set the Obsidian_valt_main_path environment variable or hardcode the path in the script.

How It Works

  1. Initialization:
    • The tool initializes a SQLite database with the required schema if it doesn’t exist.
    • It uses clap for command-line argument parsing and env_logger for detailed logging.
  2. File Processing:
    • Reads the specified Markdown file.
    • Extracts YAML frontmatter tags and inline #tags.
    • Identifies [[backlink]] references, resolving them to existing files in the database or filesystem.
    • Cleans content by removing code blocks, URLs, and other irrelevant text before processing tags and backlinks.
  3. Database Operations:
    • Inserts or updates folder and file metadata.
    • Stores tags and associates them with files.
    • Records backlinks, linking to other files when possible.
    • Handles duplicate files by preferring matches in the same folder or the shortest path.
  4. Filesystem Traversal:
    • Uses jwalk for efficient filesystem traversal when resolving backlinks.
    • Canonicalizes paths to ensure consistency across systems.

Limitations

  • Obsidian Vault: While designed for Obsidian, the tool assumes a flat or hierarchical Markdown file structure and may not handle all Obsidian-specific features.
  • Backlink Resolution: Backlinks are resolved based on file names, which may lead to ambiguities if multiple files have the same name in different folders.
  • No Real-Time Updates: The tool processes files on-demand (e.g., on save or via the script) and does not monitor the filesystem for changes (But easy to fix...).

Contributing

Contributions are welcome!

TODO / Future Improvements

  • Make full yaml extraction in json. Like in datopian/markdowndb
  • Add --watch To monitor files for changes and update the database accordingly

Why I Built This

I started using Obsidian for note-taking, but I ran into a major issue that drove me up the wall: it took 20–30 seconds to start Obsidian on my Android phone, and its search functionality was painfully slow. Searching for a specific file required remembering the full path or relying on a content-based search that didn’t prioritize file names. Using a terminal with nano on my Android was significantly faster, which pushed me to find a better solution.

I explored alternatives like Logseq, but they felt restrictive, forcing me to organize notes according to their rigid rules. Then I discovered Neovim’s powerful plugin system, which works seamlessly in a TTY environment, allowing me to edit files directly on my system without the overhead of GUI-based tools. This was a game-changer for my workflow.

My first attempt was a quick Bash script paired with a basic Lua configuration for Neovim. It worked, but it was clunky. I then tried rewriting the tool entirely in Lua, thinking I could leverage Neovim’s init.lua to manage dependencies. Big mistake. Termux, my Android terminal environment, didn’t support Lua libraries well, and the setup broke completely when a package link for Lua libraries changed unexpectedly. The frustration of dealing with broken dependencies pushed me to my limit.

Eventually, I turned to Rust to create a static binary that wouldn’t rely on fickle dependencies or slow plugins. I briefly experimented with epwalsh/obsidian.nvim, which was promising but took an excruciating 14 seconds to follow a backlink on my low-powered device—slower than my rg (ripgrep) searches! While obsidian.nvim is a great tool for more powerful systems, it wasn’t suitable for my "potato calculator." So, I built markdown-scanner to create a lightweight, fast, and reliable solution that integrates with Neovim, processes Markdown files efficiently, and stores metadata in a SQLite database for quick access.

License

This project is licensed under the GNU General Public License v3.0.