MDQL
A database where every entry is a markdown file and every change is a readable diff.
MDQL turns folders of markdown files into a schema-validated, queryable database. Frontmatter fields are metadata columns. H2 sections are content columns. The files are the database — there is nothing else. Every file reads like a normal markdown document, but you get full SQL: SELECT, INSERT, UPDATE, DELETE, JOINs across multiple tables, ORDER BY, aggregation, computed expressions, and CASE WHEN.
Your database lives in git. Every insert, update, and migration is a readable diff. Branching, merging, and rollback come free.
Install
Quick start
# All 100 files valid in table 'strategies'
title composite
------------------------------------------------------------------- ---------
Bridge Inflow to Destination Chain → DEX Liquidity Pressure 500
DeFi Protocol TVL Step-Change → Governance Token Repricing Lag 500
Lending Protocol Daily Interest Accrual Liquidation Threshold Creep 500
USDC Circle Business-Day Redemption Queue — Weekend Premium Decay 490
Cascading Liquidation Chain — Second-Order Collateral Asset Short 480
Why MDQL
- Zero infrastructure. No server, no Docker, no connection strings.
git cloneand you have the database.rm -rfand it's gone. - Data review via pull requests. Data changes go through the same PR review process as code. A reviewer reads the diff of an INSERT the way they read a code change.
- Branch-level isolation. An agent works on a feature branch, inserts and updates entries freely, and the main database is untouched until merge. Multiple agents work in parallel without coordination.
- No serialization boundary. The storage format is the readable format. An LLM sees a well-structured markdown document, not a JSON blob or SQL dump.
- Graceful degradation. If you stop using MDQL tomorrow, you still have a folder of valid markdown files. No proprietary format to export from.
- Section-level content columns. Long-form structured prose — a hypothesis, a methodology, kill criteria — is a first-class queryable column.
SELECT Hypothesis FROM strategies WHERE status = 'LIVE'. - Every unix tool still works.
grep -r "funding" strategies/works.wc -l strategies/*.mdworks.diffworks. - Self-documenting schemas. The schema file is a markdown document. Its body explains the fields, conventions, and rationale. An LLM reading
_mdql.mdgets both the machine-readable schema and the human context for why fields exist. - Schema migrations are diffs.
ALTER TABLE RENAME FIELDrewrites every file. The migration shows up as a git diff. - Audit trail for free.
git blame strategies/bad-debt-socialization-event-token-short.mdtells you who changed what and when.
Directory structure
my-project/
_mdql.md # type: database — config + foreign keys
strategies/
_mdql.md # type: schema — table schema + docs
bad-debt-socialization-event-token-short.md
aave-utilization-kink-rate-spike-borrow-unwind-short.md
...
backtests/
_mdql.md # type: schema
bt-bad-debt-socialization-binance.md
...
src/ # no _mdql.md — invisible to MDQL
docs/ # no _mdql.md — invisible to MDQL
A _mdql.md file marks a directory as part of an MDQL database. The type field in frontmatter determines what it is — database at the root, schema in each table folder. Directories without _mdql.md are ignored, so MDQL coexists with any project structure.
How it works
One folder = one table. One markdown file = one row.
A row file looks like this:
title: "Bad Debt Socialization Event — Token Short"
status: HYPOTHESIS
mechanism: 7
categories:
- -
When an on-chain lending protocol accumulates bad debt that exceeds
its reserve buffer, the smart contract mints governance tokens...
The protocol's shortfall module triggers an auction...
- YAML frontmatter fields are metadata columns (
title,status,mechanism, ...) - H2 sections are content columns (
Hypothesis,Structural Mechanism, ...) - The
path(filename) is the implicit primary key createdandmodifiedare reserved timestamp fields, auto-managed bymdql stamp- All columns are queryable with SQL
_mdql.md files
Every MDQL-managed directory has a _mdql.md file. The type field in frontmatter says what kind.
Table schema (type: schema)
type: schema
table: strategies
primary_key: path
frontmatter:
title:
mechanism:
categories:
h1:
required: false
sections: {}
rules:
reject_unknown_frontmatter: true
reject_unknown_sections: false
reject_duplicate_sections: true
Documentation about this table goes here.
Supported types: string, int, float, bool, date, string[]
Database config (type: database)
type: database
name: zunid
foreign_keys:
-
Trading strategy research database.
The markdown body in both cases is documentation — ignored by the engine, useful for humans and LLMs.
Foreign key validation
Foreign keys defined in the database config are validated automatically. No setup required.
At load time: Every call to load_database() checks all FK constraints. If backtests.strategy references a file that does not exist in strategies.path, the error is returned alongside the data. CLI commands (query, validate, repl) print FK warnings to stderr.
In the REPL: A filesystem watcher runs in the background. If you rename or delete a file in another terminal, the REPL detects the change within 500ms and prints any new FK violations.
In the web UI: Same filesystem watcher runs as a background task. FK errors are available at GET /api/fk-errors.
With mdql validate: When pointed at a database directory (not just a single table), reports per-table schema validation summaries followed by FK violations:
Table 'strategies': 100 files valid
Table 'backtests': 18 files valid
Foreign key violations:
backtests/bt-broken.md: strategy = 'nonexistent.md' not found in strategies
NULL FK values are not violations — a backtest with no strategy set is valid.
Python API
Database and Table
=
=
SELECT with JOINs
Database.query() runs SQL across all tables in the database, including multi-table JOINs.
, =
# rows: list of dicts, one per result row
# columns: list of column names
Single-table queries
Table.query() runs a SELECT query on one table and returns structured results.
, =
# rows: list of dicts
# columns: list of column names
# Computed expressions and CASE WHEN
, =
# Conditional aggregation
, =
Load rows with filtering
Table.load() returns all rows, optionally filtered by a dict of field values.
# All rows
, =
# Filtered by dict — equality matching
, =
# Filtered by SQL WHERE string — full operator support
, =
, =
The where parameter accepts a dict (equality matching) or a SQL WHERE string (supports =, !=, <, >, <=, >=, LIKE, IN, IS NULL, AND, OR). errors contains any schema validation issues found during loading.
INSERT
# Create a new row — filename derived from title
# Returns: Path to created file (e.g. my-new-strategy.md)
# created/modified timestamps set automatically
# required sections scaffolded as empty ## headings
# validated against schema before writing
# With pre-formatted body (e.g. from Claude output)
# Overwrite existing file, preserve created timestamp
UPDATE
# Partial merge — only the fields you pass are changed
# Update body only
Bulk UPDATE
Table.update_many() updates the same fields across multiple files.
=
# Returns: list of paths that were updated
DELETE
Schema operations
=
# section or frontmatter
# section or frontmatter
# sections only
Validation
=
# Returns: list of validation errors (schema + FK)
All writes are validated against the schema and rolled back on failure. The created timestamp is always preserved on replace and update; modified is always set to today.
CLI commands
mdql query <folder> "<sql>"
Run SQL against a table or database. Supports SELECT, INSERT INTO, UPDATE SET, DELETE FROM, ALTER TABLE, and JOIN.
# Filter and sort
# Query section content
# Category search (LIKE works on arrays)
# Output as JSON
Supported WHERE operators: =, !=, <, >, <=, >=, LIKE, IN, IS NULL, IS NOT NULL, AND, OR
Column names with spaces use backticks: SELECT `Structural Mechanism` FROM strategies
Computed expressions
Arithmetic expressions (+, -, *, /, %) work in SELECT, WHERE, and ORDER BY. Supports parentheses, unary minus, and mixed int/float coercion.
# Computed columns with aliases
# Expressions in WHERE
# Parenthesized expressions
Integer division truncates (7 / 2 = 3). Division by zero returns NULL. NULL propagates through all arithmetic.
Column aliases
Columns can be aliased with AS or by placing the alias directly after the expression (implicit alias). ORDER BY can reference SELECT aliases.
# Explicit alias with AS
# Implicit alias (no AS keyword)
CASE WHEN
CASE WHEN expressions work anywhere a value is expected — in SELECT, WHERE, ORDER BY, and inside aggregate functions.
# Categorize rows
# Conditional aggregation
JOINs
Point at the database directory (parent of table folders) for cross-table queries. Supports two or more tables:
# Two-table JOIN
# Multi-table JOIN
SQL write operations
# INSERT
# UPDATE
# DELETE
For string[] columns, pass comma-separated values in a single string: 'funding-rates,defi'.
ALTER TABLE — field migrations
Rename, drop, or merge fields across all files in a table. Works for both frontmatter fields and sections. The schema _mdql.md is updated automatically.
# ALTER TABLE — renamed 'Summary' to 'Overview' in 42 files
Field names can be single-quoted ('Name'), backtick-quoted (`Name With Spaces`), or bare identifiers.
mdql rename <db-folder> <table> <old-name> <new-name>
Rename a file within a table. Automatically updates all foreign key references in other tables that point to the old filename.
# Renamed strategies/bad-debt-socialization-event-token-short.md → bad-debt-token-short.md
# Updated 3 references in backtests
mdql create <folder> --set key=value
Create a new row file. Field types are coerced from the schema (e.g. --set mechanism=5 becomes int).
For string[] fields, use comma-separated values: -s 'categories=funding-rates,defi'
mdql validate <folder>
Validate all markdown files against the schema. Works on a single table or a database directory.
# All 100 files valid in table 'strategies'
Invalid files get clear error messages:
missing-field.md: Missing required frontmatter field 'count'
wrong-type-date.md: Field 'created' expected date, got string 'yesterday'
duplicate-section.md: Duplicate section 'Body' (appears 2 times)
When pointed at a database directory, also reports foreign key violations (see Foreign key validation).
mdql inspect <folder>
Show normalized rows.
mdql stamp <folder>
Add or update created and modified timestamps in all data files.
# Stamped 100 files: 0 created set, 100 modified updated
createdis set to today's date if missing, never overwrittenmodifiedis always updated to today's date- Both are ISO date strings (
"YYYY-MM-DD") in frontmatter - These fields are reserved — schemas don't need to declare them, and they are never rejected as unknown fields
mdql schema <folder>
Print the effective schema. Works on a single table or the whole database.
mdql repl <folder>
Open an interactive REPL for running queries. Supports tab completion for table names, column names, and SQL keywords.
When pointed at a database directory, runs a background filesystem watcher that prints FK violations to stderr if files change on disk while the REPL is open.
mdql client <folder>
Open a browser-based UI for running queries. Starts a local web server with a query editor.
The web server exposes a REST API:
POST /api/query— execute SQLGET /api/fk-errors— current foreign key violations (updated by background watcher)
Multi-agent setup
MDQL is a single-writer, filesystem-based database. When multiple agents or processes need to read and write the same data, point them all at the same directory. MDQL's flock locking serializes writes automatically.
For multi-agent setups, keep the database in its own directory (and optionally its own git repo for audit trail), separate from application code:
~/repos/
my-project/ # application code — branched freely
my-project-db/ # MDQL database — shared by all agents
_mdql.md
strategies/
orders/
MDQL_DATABASE_PATH
Set the MDQL_DATABASE_PATH environment variable so agents and CLI commands find the database without hardcoding paths.
# CLI commands fall back to this when no folder is given
# Reads MDQL_DATABASE_PATH when no path is given
=
An explicit path always takes precedence: Database("/other/path") and mdql validate /other/path ignore the env var.
Pandas integration
One-liner
=
Two-step (when you already have rows)
, , =
=
Schema types map to pandas dtypes:
| MDQL type | pandas dtype |
|---|---|
string |
string |
int |
Int64 (nullable) |
float |
Float64 (nullable) |
bool |
boolean (nullable) |
date |
datetime64[ns] |
string[] |
Python lists |
Validation errors are handled via the errors parameter: "warn" (default), "raise", or "ignore".
ACID compliance
All write operations are process-safe. Three layers of protection:
Atomic writes. Every file write goes through a temp-file-then-rename path. If the process crashes mid-write, the original file is untouched.
Table locking. Write operations acquire an exclusive fcntl.flock per table. Two processes writing to the same table serialize rather than corrupt each other's files.
Write-ahead journal. Multi-file operations (ALTER TABLE, batch UPDATE/DELETE, stamp) write a journal before making changes. If the process crashes mid-operation, the next Table() construction detects the journal and rolls back all partial changes automatically.
# Safe even if the process is killed mid-way:
# touches 100 files + schema
# On crash: next Table("strategies/") auto-recovers from journal
Running tests
# Rust tests
# Python tests (requires maturin develop first)
Project structure
crates/
mdql-core/ # core library: parser, schema, validator, query engine,
# indexes, caching, full-text search, ACID transactions,
# FK validation, filesystem watcher
mdql/ # CLI binary: validate, query, create, inspect, schema,
# stamp, rename, repl (with autocomplete), client (web UI)
mdql-web/ # browser UI: axum REST server + embedded SPA
python/
src/lib.rs # PyO3 bindings (Rust → Python)
mdql/ # Python wrapper package (thin layer over Rust)
tests/ # Python test suite
examples/ # example data (strategies, backtests)
License
AGPL-3.0. Commercial licenses available — see LICENSE.md.