mdvs 0.1.1

A database of markdown documents — schema validation and semantic search
Documentation

mdvs — Markdown Validation & Search

CI License: MIT Rust Docs

:x: A Document Database

:white_check_mark: A Database for Documents

mdvs infers a schema from your frontmatter, validates it, and gives you semantic search with SQL filtering. Single binary, no cloud, no setup.

Install

Prebuilt binary (macOS / Linux)

curl --proto '=https' --tlsv1.2 -LsSf https://github.com/edochi/mdvs/releases/latest/download/mdvs-installer.sh | sh

From crates.io

cargo install mdvs

From source

git clone https://github.com/edochi/mdvs.git
cd mdvs
cargo install --path .

Quick Start

# Initialize: scans your files, infers a schema, builds a search index
mdvs init ~/notes

# Search with natural language
mdvs search "how to handle errors in rust"

# Filter results with SQL on frontmatter fields
mdvs search "async patterns" --where "draft = false" --limit 5

# Validate frontmatter against the inferred schema
mdvs check

That's it. No config files to write, no models to download manually, no services to start.

Features

Schema inference

mdvs scans your markdown files and infers a typed schema from frontmatter — field names, types (boolean, integer, float, string, arrays, nested objects), which directories they appear in, and which ones are required. The schema is written to mdvs.toml and can be customized.

mdvs init ~/notes
# Discovered 10 fields across 496 files
#   tags       String[]  (required in ["**"])
#   draft      Boolean   (allowed in ["blog/**"])
#   year       Integer   (required in ["articles/**"])
#   ...

Frontmatter validation

Check your files against the schema — catch missing required fields, wrong types, and fields that appear where they shouldn't.

mdvs check
# blog/draft.md: missing required field 'tags'
# blog/old-post.md: field 'year' expected Integer, got String

Semantic search

Instant vector search using lightweight static embeddings (Model2Vec). The default model is 8MB — no GPU, no API keys, no network access needed at query time.

mdvs search "distributed consensus algorithms"
0.72  notes/raft.md
0.68  notes/paxos.md
0.61  blog/distributed-systems.md

All commands support --output json for scripting and pipelines:

mdvs search "distributed consensus" --output json
{
  "hits": [
    { "filename": "notes/raft.md", "score": 0.72 },
    { "filename": "notes/paxos.md", "score": 0.68 },
    { "filename": "blog/distributed-systems.md", "score": 0.61 }
  ]
}

SQL filtering

Filter search results on any frontmatter field using SQL syntax, powered by DataFusion.

mdvs search "rust" --where "draft = false AND year >= 2024"
mdvs search "recipes" --where "tags IS NOT NULL" --limit 5

Incremental builds

Only changed files are re-embedded. Unchanged files keep their existing chunks and embeddings. If nothing changed, the model isn't even loaded.

mdvs build
# Built index: 3 new, 1 edited, 492 unchanged, 0 removed (4 files embedded)

Commands

Command Description
init Scan files, infer schema, write mdvs.toml, optionally build index
check Validate frontmatter against schema
update Re-scan and update field definitions
build Validate + embed + write search index
search Semantic search with optional SQL filtering
info Show config and index status
clean Delete search index

How it works

mdvs treats your markdown directory like a database:

  • init scans your files and infers a schema from frontmatter — like CREATE TABLE
  • check validates every file against that schema — like constraint checking
  • update detects new fields as your files evolve — like ALTER TABLE
  • build chunks and embeds your content into a local Parquet index
  • search queries that index with SQL filtering on metadata — like SELECT ... WHERE ... ORDER BY similarity

Two artifacts: mdvs.toml (committed, your schema) and .mdvs/ (gitignored, the search index).

Documentation

Full documentation at edochi.github.io/mdvs.

License

MIT