markdown-scanner 1.1.0

markdown-scanner is a Rust-based command-line tool designed to scan Markdown files within a specified directory in sqlite db.
# Markdown Scanner

## Overview
`markdown-scanner` is a Rust-based command-line tool designed to scan Markdown files within a specified directory (e.g., an Obsidian vault) and extract metadata such as tags and backlinks. It stores this information in a SQLite database for efficient querying and organization. The tool is invoked via a Bash script (`markdown-processor-all-rust.bash`) that processes all `.md` files in a directory, making it suitable for integration with text editors like Neovim or workflows involving Markdown-based note-taking systems like Obsidian.

## Features
- **Tag Extraction**: Extracts both YAML frontmatter tags and inline `#tags` from Markdown files, ignoring tags within code blocks.
- **Backlink Detection**: Identifies `[[backlink]]` references in Markdown files and links them to corresponding files in the database.
- **SQLite Database**: Stores file metadata, folder structure, tags, and backlinks in a relational SQLite database for easy querying.
- **File System Integration**: Resolves file paths relative to a base directory and handles file system changes, ensuring accurate metadata.
- **Error Handling**: Robust error handling with custom error types and detailed logging for debugging.
- **Editor Integration**: Designed to be triggered on file save in editors like Neovim or used in batch processing for Markdown vaults.

### Extra Features 
- **YouTube Title Extraction**  
  Automatically fetches and stores the video title in the database (as JSON) when a YouTube link is detected.
- **File Creation Time Tracking**  
  Stores the file creation timestamp when available.  
  Falls back to modification time on filesystems that don't provide reliable birth/creation time (termux, etc.).

## Usage
The tool is typically executed via the provided Bash script or directly as a command-line utility.

### Project that use markdown-scanner
- [nvim-minimal](https://github.com/andrenaP/nvim-minimal) 
- [midetor](https://github.com/andrenaP/midetor)

### Bash Script
The `markdown-processor-all-rust.bash` script processes all `.md` files in a specified directory (e.g., an Obsidian vault):

```bash
#!/bin/bash
DB="markdown_data.db"
find "$Obsidian_valt_main_path" -name "*.md" | while read -r file; do
    echo "Processing file: $file"
    markdown-scanner "$file" "$Obsidian_valt_main_path" -d "$DB"
    echo "Data inserted for file: $file"
done
```

- **Environment Variable**: Set `Obsidian_valt_main_path` to the root directory of your Markdown files.
- **Database**: Specify the SQLite database file (defaults to `markdown_data.db`).
- **Execution**: Run the script to process all `.md` files in the specified directory.

### Command-Line Usage
The Rust binary can be invoked directly:

```bash
markdown-scanner <file_path> <base_dir> -d <database_path>
```

- `<file_path>`: Path to the Markdown file to process.
- `<base_dir>`: Base directory for resolving relative paths.
- `-d <database_path>`: Path to the SQLite database (default: `markdown_data.db`).

Example:
```bash
markdown-scanner /path/to/note.md /path/to/vault/dir/ -d markdown_data.db
```

### Integration with Neovim
Well I was using it in neovim for a long time. I think I will make it in one plugin when I rip the code form my enormous init.lua.

## Database Schema
The SQLite database (`markdown_data.db`) contains the following tables:

- **folders**: Stores unique folder paths with their IDs.
  - `id`: Primary key.
  - `path`: Relative folder path (unique).
- **files**: Stores file metadata.
  - `id`: Primary key.
  - `path`: Relative file path (unique).
  - `file_name`: Name of the file.
  - `folder_id`: References `folders(id)`.
  - `metadata` : yaml data.
- **tags**: Stores unique tags.
  - `id`: Primary key.
  - `tag`: Tag name (unique).
- **file_tags**: Maps files to tags.
  - `file_id`: References `files(id)`.
  - `tag_id`: References `tags(id)`.
  - Unique constraint on `(file_id, tag_id)`.
- **backlinks**: Stores backlink relationships.
  - `id`: Primary key.
  - `backlink`: Backlink text (e.g., `Note Title`).
  - `backlink_id`: References `files(id)` (nullable).
  - `file_id`: References `files(id)`.
  - Unique constraint on `(backlink_id, file_id, backlink)`.

## Installation
1. **Prerequisites**:
   - Rust (stable) and Cargo for building the Rust binary.
   - SQLite library for database operations (optioinal but recommended).
   - Bash for running the script.
2. **Build**:
   ```bash
   cargo build --release
   ```
   Or use [this if you are using linux](cheat sheet.md)

3. **Set Up Script**:
   - Copy `markdown-processor-all-rust.bash` to a vault.
   - Ensure it’s executable: `chmod +x markdown-processor-all-rust.bash`.
   - Set the `Obsidian_valt_main_path` environment variable or hardcode the path in the script.

## How It Works
1. **Initialization**:
   - The tool initializes a SQLite database with the required schema if it doesn’t exist.
   - It uses `clap` for command-line argument parsing and `env_logger` for detailed logging.
2. **File Processing**:
   - Reads the specified Markdown file.
   - Extracts YAML frontmatter tags and inline `#tags`.
   - Identifies `[[backlink]]` references, resolving them to existing files in the database or filesystem.
   - Cleans content by removing code blocks, URLs, and other irrelevant text before processing tags and backlinks.
3. **Database Operations**:
   - Inserts or updates folder and file metadata.
   - Stores tags and associates them with files.
   - Records backlinks, linking to other files when possible.
   - Handles duplicate files by preferring matches in the same folder or the shortest path.
4. **Filesystem Traversal**:
   - Uses `jwalk` for efficient filesystem traversal when resolving backlinks.
   - Canonicalizes paths to ensure consistency across systems.

## Limitations
- **Obsidian Vault**: While designed for Obsidian, the tool assumes a flat or hierarchical Markdown file structure and may not handle all Obsidian-specific features.
- **Backlink Resolution**: Backlinks are resolved based on file names, which may lead to ambiguities if multiple files have the same name in different folders.
- **No Real-Time Updates**: The tool processes files on-demand (e.g., on save or via the script) and does not monitor the filesystem for changes (But easy to fix...).

## Contributing
Contributions are welcome!


## TODO / Future Improvements

* [x] Make full yaml extraction in json. Like in `datopian/markdowndb`
* [ ] Add `--watch` To monitor files for changes and update the database accordingly


## Why I Built This
I started using Obsidian for note-taking, but I ran into a major issue that drove me up the wall: it took 20–30 seconds to start Obsidian on my Android phone, and its search functionality was painfully slow. Searching for a specific file required remembering the full path or relying on a content-based search that didn’t prioritize file names. Using a terminal with `nano` on my Android was significantly faster, which pushed me to find a better solution.

I explored alternatives like Logseq, but they felt restrictive, forcing me to organize notes according to their rigid rules. Then I discovered Neovim’s powerful plugin system, which works seamlessly in a TTY environment, allowing me to edit files directly on my system without the overhead of GUI-based tools. This was a game-changer for my workflow.

My first attempt was a quick Bash script paired with a basic Lua configuration for Neovim. It worked, but it was clunky. I then tried rewriting the tool entirely in Lua, thinking I could leverage Neovim’s `init.lua` to manage dependencies. Big mistake. Termux, my Android terminal environment, didn’t support Lua libraries well, and the setup broke completely when a package link for Lua libraries changed unexpectedly. The frustration of dealing with broken dependencies pushed me to my limit.

Eventually, I turned to Rust to create a static binary that wouldn’t rely on fickle dependencies or slow plugins. I briefly experimented with `epwalsh/obsidian.nvim`, which was promising but took an excruciating 14 seconds to follow a backlink on my low-powered device—slower than my `rg` (ripgrep) searches! While `obsidian.nvim` is a great tool for more powerful systems, it wasn’t suitable for my "potato calculator." So, I built `markdown-scanner` to create a lightweight, fast, and reliable solution that integrates with Neovim, processes Markdown files efficiently, and stores metadata in a SQLite database for quick access.

## License
This project is licensed under the GNU General Public License v3.0.