mdql-core 0.1.0

Core library for MDQL — a queryable database backed by markdown files
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
# MDQL

A database where every entry is a markdown file and every change is a readable diff.

MDQL turns folders of markdown files into a schema-validated, queryable database. Frontmatter fields are metadata columns. H2 sections are content columns. The files are the database — there is nothing else. Every file reads like a normal markdown document, but you get full SQL: SELECT, INSERT, UPDATE, DELETE, JOINs across multiple tables, ORDER BY, and aggregation.

Your database lives in git. Every insert, update, and migration is a readable diff. Branching, merging, and rollback come free.

## Quick start

```bash
cargo install mdql
mdql validate examples/strategies/
# All 100 files valid in table 'strategies'

mdql query examples/strategies/ \
  "SELECT title, composite FROM strategies ORDER BY composite DESC LIMIT 5"
```

```
title                                                                composite
-------------------------------------------------------------------  ---------
Bridge Inflow to Destination Chain → DEX Liquidity Pressure                500
DeFi Protocol TVL Step-Change → Governance Token Repricing Lag             500
Lending Protocol Daily Interest Accrual Liquidation Threshold Creep        500
USDC Circle Business-Day Redemption Queue — Weekend Premium Decay          490
Cascading Liquidation Chain — Second-Order Collateral Asset Short          480
```

## Why MDQL

- **Zero infrastructure.** No server, no Docker, no connection strings. `git clone` and you have the database. `rm -rf` and it's gone.
- **Data review via pull requests.** Data changes go through the same PR review process as code. A reviewer reads the diff of an INSERT the way they read a code change.
- **Branch-level isolation.** An agent works on a feature branch, inserts and updates entries freely, and the main database is untouched until merge. Multiple agents work in parallel without coordination.
- **No serialization boundary.** The storage format is the readable format. An LLM sees a well-structured markdown document, not a JSON blob or SQL dump.
- **Graceful degradation.** If you stop using MDQL tomorrow, you still have a folder of valid markdown files. No proprietary format to export from.
- **Section-level content columns.** Long-form structured prose — a hypothesis, a methodology, kill criteria — is a first-class queryable column. `SELECT Hypothesis FROM strategies WHERE status = 'LIVE'`.
- **Every unix tool still works.** `grep -r "funding" strategies/` works. `wc -l strategies/*.md` works. `diff` works.
- **Self-documenting schemas.** The schema file is a markdown document. Its body explains the fields, conventions, and rationale. An LLM reading `_mdql.md` gets both the machine-readable schema and the human context for why fields exist.
- **Schema migrations are diffs.** `ALTER TABLE RENAME FIELD` rewrites every file. The migration shows up as a git diff.
- **Audit trail for free.** `git blame strategies/bad-debt-socialization-event-token-short.md` tells you who changed what and when.

## Directory structure

```
my-project/
  _mdql.md                    # type: database — config + foreign keys
  strategies/
    _mdql.md                  # type: schema — table schema + docs
    bad-debt-socialization-event-token-short.md
    aave-utilization-kink-rate-spike-borrow-unwind-short.md
    ...
  backtests/
    _mdql.md                  # type: schema
    bt-bad-debt-socialization-binance.md
    ...
  src/                        # no _mdql.md — invisible to MDQL
  docs/                       # no _mdql.md — invisible to MDQL
```

A `_mdql.md` file marks a directory as part of an MDQL database. The `type` field in frontmatter determines what it is — `database` at the root, `schema` in each table folder. Directories without `_mdql.md` are ignored, so MDQL coexists with any project structure.

## How it works

One folder = one table. One markdown file = one row.

A row file looks like this:

```markdown
---
title: "Bad Debt Socialization Event — Token Short"
status: HYPOTHESIS
mechanism: 7
categories:
  - defi-protocol
  - lending
created: "2026-04-03"
modified: "2026-04-05"
---

## Hypothesis

When an on-chain lending protocol accumulates bad debt that exceeds
its reserve buffer, the smart contract mints governance tokens...

## Structural Mechanism

The protocol's shortfall module triggers an auction...
```

- YAML frontmatter fields are metadata columns (`title`, `status`, `mechanism`, ...)
- H2 sections are content columns (`Hypothesis`, `Structural Mechanism`, ...)
- The `path` (filename) is the implicit primary key
- `created` and `modified` are reserved timestamp fields, auto-managed by `mdql stamp`
- All columns are queryable with SQL

## `_mdql.md` files

Every MDQL-managed directory has a `_mdql.md` file. The `type` field in frontmatter says what kind.

### Table schema (`type: schema`)

```markdown
---
type: schema
table: strategies
primary_key: path

frontmatter:
  title:
    type: string
    required: true
  mechanism:
    type: int
    required: true
  categories:
    type: string[]
    required: true

h1:
  required: false

sections: {}

rules:
  reject_unknown_frontmatter: true
  reject_unknown_sections: false
  reject_duplicate_sections: true
---

# strategies

Documentation about this table goes here.
```

Supported types: `string`, `int`, `float`, `bool`, `date`, `string[]`

### Database config (`type: database`)

```markdown
---
type: database
name: zunid

foreign_keys:
  - from: backtests.strategy
    to: strategies.path
---

# zunid

Trading strategy research database.
```

The markdown body in both cases is documentation — ignored by the engine, useful for humans and LLMs.

## Foreign key validation

Foreign keys defined in the database config are validated automatically. No setup required.

**At load time:** Every call to `load_database()` checks all FK constraints. If `backtests.strategy` references a file that does not exist in `strategies.path`, the error is returned alongside the data. CLI commands (`query`, `validate`, `repl`) print FK warnings to stderr.

**In the REPL:** A filesystem watcher runs in the background. If you rename or delete a file in another terminal, the REPL detects the change within 500ms and prints any new FK violations.

**In the web UI:** Same filesystem watcher runs as a background task. FK errors are available at `GET /api/fk-errors`.

**With `mdql validate`:** When pointed at a database directory (not just a single table), reports per-table schema validation summaries followed by FK violations:

```bash
mdql validate examples/
```

```
Table 'strategies': 100 files valid
Table 'backtests': 18 files valid
Foreign key violations:
  backtests/bt-broken.md: strategy = 'nonexistent.md' not found in strategies
```

NULL FK values are not violations — a backtest with no strategy set is valid.

## Python API

```bash
pip install mdql
```

### Database and Table

```python
from mdql import Database, Table

db = Database("examples/")
strategies = db.table("strategies")
```

### SELECT with JOINs

`Database.query()` runs SQL across all tables in the database, including multi-table JOINs.

```python
rows, columns = db.query(
    "SELECT s.title, b.sharpe, b.status "
    "FROM strategies s "
    "JOIN backtests b ON b.strategy = s.path"
)
# rows: list of dicts, one per result row
# columns: list of column names
```

### Single-table queries

`Table.query()` runs a SELECT query on one table and returns structured results.

```python
rows, columns = strategies.query(
    "SELECT status, COUNT(*) AS cnt FROM strategies GROUP BY status"
)
# rows: list of dicts
# columns: list of column names
```

### Load rows with filtering

`Table.load()` returns all rows, optionally filtered by a dict of field values.

```python
# All rows
rows, errors = strategies.load()

# Filtered by dict — equality matching
rows, errors = strategies.load(where={"status": "LIVE"})

# Filtered by SQL WHERE string — full operator support
rows, errors = strategies.load(where="mechanism >= 7 AND status = 'HYPOTHESIS'")
rows, errors = strategies.load(where="categories LIKE '%defi%'")
```

The `where` parameter accepts a dict (equality matching) or a SQL WHERE string (supports `=`, `!=`, `<`, `>`, `<=`, `>=`, `LIKE`, `IN`, `IS NULL`, `AND`, `OR`). `errors` contains any schema validation issues found during loading.

### INSERT

```python
# Create a new row — filename derived from title
strategies.insert({
    "title": "My New Strategy",
    "status": "HYPOTHESIS",
    "mechanism": 5,
    "implementation": 4,
    "safety": 7,
    "frequency": 3,
    "composite": 420,
    "categories": ["exchange-structure"],
    "pipeline_stage": "Pre-backtest (step 2 of 9)",
})
# Returns: Path to created file (e.g. my-new-strategy.md)
# created/modified timestamps set automatically
# required sections scaffolded as empty ## headings
# validated against schema before writing

# With pre-formatted body (e.g. from Claude output)
strategies.insert(
    {"title": "Another Strategy", "status": "HYPOTHESIS", ...},
    body=raw_markdown,  # placed verbatim after frontmatter
)

# Overwrite existing file, preserve created timestamp
strategies.insert(
    {"title": "Revised Strategy", "status": "BACKTESTING", ...},
    filename="my-new-strategy",
    replace=True,
)
```

### UPDATE

```python
# Partial merge — only the fields you pass are changed
strategies.update("my-new-strategy.md", {"status": "KILLED", "kill_reason": "No edge"})

# Update body only
strategies.update("my-new-strategy.md", {}, body=new_markdown)
```

### Bulk UPDATE

`Table.update_many()` updates the same fields across multiple files.

```python
updated_paths = strategies.update_many(
    ["file-a.md", "file-b.md", "file-c.md"],
    {"status": "KILLED"},
)
# Returns: list of paths that were updated
```

### DELETE

```python
strategies.delete("my-new-strategy.md")
```

### Schema operations

```python
table = Table("examples/strategies/")

table.rename_field("Summary", "Overview")     # section or frontmatter
table.drop_field("Details")                   # section or frontmatter
table.merge_fields(["Entry Rules", "Exit Rules"], into="Trading Rules")  # sections only
```

### Validation

```python
errors = strategies.validate()
# Returns: list of validation errors (schema + FK)
```

All writes are validated against the schema and rolled back on failure. The `created` timestamp is always preserved on `replace` and `update`; `modified` is always set to today.

## CLI commands

### `mdql query <folder> "<sql>"`

Run SQL against a table or database. Supports `SELECT`, `INSERT INTO`, `UPDATE SET`, `DELETE FROM`, `ALTER TABLE`, and `JOIN`.

```bash
# Filter and sort
mdql query examples/strategies/ \
  "SELECT title FROM strategies WHERE mechanism > 5 ORDER BY composite DESC LIMIT 5"

# Query section content
mdql query examples/strategies/ \
  "SELECT path, Hypothesis FROM strategies WHERE Hypothesis IS NOT NULL LIMIT 3"

# Category search (LIKE works on arrays)
mdql query examples/strategies/ \
  "SELECT title FROM strategies WHERE categories LIKE '%defi%'"

# Output as JSON
mdql query examples/strategies/ \
  "SELECT title, composite FROM strategies LIMIT 3" --format json
```

Supported WHERE operators: `=`, `!=`, `<`, `>`, `<=`, `>=`, `LIKE`, `IN`, `IS NULL`, `IS NOT NULL`, `AND`, `OR`

Column names with spaces use backticks: `` SELECT `Structural Mechanism` FROM strategies ``

### JOINs

Point at the database directory (parent of table folders) for cross-table queries. Supports two or more tables:

```bash
# Two-table JOIN
mdql query examples/ \
  "SELECT s.title, b.sharpe, b.status
   FROM strategies s
   JOIN backtests b ON b.strategy = s.path"

# Multi-table JOIN
mdql query my-db/ \
  "SELECT s.title, b.result, c.verdict
   FROM strategies s
   JOIN backtests b ON b.strategy = s.path
   JOIN critiques c ON c.strategy = s.path"
```

### SQL write operations

```bash
# INSERT
mdql query examples/strategies/ \
  "INSERT INTO strategies (title, status, mechanism, implementation, safety, frequency, composite, categories, pipeline_stage)
   VALUES ('New Strategy', 'HYPOTHESIS', 5, 4, 7, 3, 420, 'exchange-structure', 'Pre-backtest')"

# UPDATE
mdql query examples/strategies/ \
  "UPDATE strategies SET status = 'KILLED', kill_reason = 'No edge' WHERE path = 'new-strategy.md'"

# DELETE
mdql query examples/strategies/ \
  "DELETE FROM strategies WHERE path = 'new-strategy.md'"
```

For `string[]` columns, pass comma-separated values in a single string: `'funding-rates,defi'`.

### ALTER TABLE — field migrations

Rename, drop, or merge fields across all files in a table. Works for both frontmatter fields and sections. The schema `_mdql.md` is updated automatically.

```bash
mdql query examples/strategies/ \
  "ALTER TABLE strategies RENAME FIELD 'Summary' TO 'Overview'"
# ALTER TABLE — renamed 'Summary' to 'Overview' in 42 files

mdql query examples/strategies/ \
  "ALTER TABLE strategies DROP FIELD 'Details'"

mdql query examples/strategies/ \
  "ALTER TABLE strategies MERGE FIELDS 'Entry Rules', 'Exit Rules' INTO 'Trading Rules'"
```

Field names can be single-quoted (`'Name'`), backtick-quoted (`` `Name With Spaces` ``), or bare identifiers.

### `mdql rename <db-folder> <table> <old-name> <new-name>`

Rename a file within a table. Automatically updates all foreign key references in other tables that point to the old filename.

```bash
mdql rename examples/ strategies bad-debt-socialization-event-token-short.md bad-debt-token-short.md
# Renamed strategies/bad-debt-socialization-event-token-short.md → bad-debt-token-short.md
# Updated 3 references in backtests
```

### `mdql create <folder> --set key=value`

Create a new row file. Field types are coerced from the schema (e.g. `--set mechanism=5` becomes int).

```bash
mdql create examples/strategies/ \
  -s 'title=My New Strategy' \
  -s 'status=HYPOTHESIS' \
  -s 'mechanism=5' \
  -s 'implementation=4' \
  -s 'safety=7' \
  -s 'frequency=3' \
  -s 'composite=420' \
  -s 'categories=exchange-structure' \
  -s 'pipeline_stage=Pre-backtest (step 2 of 9)'
```

For `string[]` fields, use comma-separated values: `-s 'categories=funding-rates,defi'`

### `mdql validate <folder>`

Validate all markdown files against the schema. Works on a single table or a database directory.

```bash
mdql validate examples/strategies/
# All 100 files valid in table 'strategies'
```

Invalid files get clear error messages:

```
missing-field.md: Missing required frontmatter field 'count'
wrong-type-date.md: Field 'created' expected date, got string 'yesterday'
duplicate-section.md: Duplicate section 'Body' (appears 2 times)
```

When pointed at a database directory, also reports foreign key violations (see [Foreign key validation](#foreign-key-validation)).

### `mdql inspect <folder>`

Show normalized rows.

```bash
mdql inspect examples/strategies/ -f bad-debt-socialization-event-token-short.md --format json
```

### `mdql stamp <folder>`

Add or update `created` and `modified` timestamps in all data files.

```bash
mdql stamp examples/strategies/
# Stamped 100 files: 0 created set, 100 modified updated
```

- `created` is set to today's date if missing, never overwritten
- `modified` is always updated to today's date
- Both are ISO date strings (`"YYYY-MM-DD"`) in frontmatter
- These fields are reserved — schemas don't need to declare them, and they are never rejected as unknown fields

### `mdql schema <folder>`

Print the effective schema. Works on a single table or the whole database.

```bash
mdql schema examples/
```

### `mdql repl <folder>`

Open an interactive REPL for running queries. Supports tab completion for table names, column names, and SQL keywords.

```bash
mdql repl examples/
```

When pointed at a database directory, runs a background filesystem watcher that prints FK violations to stderr if files change on disk while the REPL is open.

### `mdql client <folder>`

Open a browser-based UI for running queries. Starts a local web server with a query editor.

```bash
mdql client examples/
```

The web server exposes a REST API:
- `POST /api/query` — execute SQL
- `GET /api/fk-errors` — current foreign key violations (updated by background watcher)

## Pandas integration

```bash
pip install mdql[pandas]
```

### One-liner

```python
from mdql.pandas import load_dataframe

df = load_dataframe("examples/strategies/")
```

### Two-step (when you already have rows)

```python
from mdql.loader import load_table
from mdql.pandas import to_dataframe

schema, rows, errors = load_table("examples/strategies/")
df = to_dataframe(rows, schema)
```

Schema types map to pandas dtypes:

| MDQL type  | pandas dtype       |
|------------|--------------------|
| `string`   | `string`           |
| `int`      | `Int64` (nullable) |
| `float`    | `Float64` (nullable) |
| `bool`     | `boolean` (nullable) |
| `date`     | `datetime64[ns]`   |
| `string[]` | Python lists       |

Validation errors are handled via the `errors` parameter: `"warn"` (default), `"raise"`, or `"ignore"`.

## ACID compliance

All write operations are process-safe. Three layers of protection:

**Atomic writes.** Every file write goes through a temp-file-then-rename path. If the process crashes mid-write, the original file is untouched.

**Table locking.** Write operations acquire an exclusive `fcntl.flock` per table. Two processes writing to the same table serialize rather than corrupt each other's files.

**Write-ahead journal.** Multi-file operations (`ALTER TABLE`, batch `UPDATE`/`DELETE`, `stamp`) write a journal before making changes. If the process crashes mid-operation, the next `Table()` construction detects the journal and rolls back all partial changes automatically.

```python
# Safe even if the process is killed mid-way:
table.rename_field("Summary", "Overview")  # touches 100 files + schema
# On crash: next Table("strategies/") auto-recovers from journal
```

## Running tests

```bash
# Rust tests
cargo test

# Python tests (requires maturin develop first)
pytest
```

## Project structure

```
crates/
  mdql-core/        # core library: parser, schema, validator, query engine,
                     # indexes, caching, full-text search, ACID transactions,
                     # FK validation, filesystem watcher
  mdql/             # CLI binary: validate, query, create, inspect, schema,
                     # stamp, rename, repl (with autocomplete), client (web UI)
  mdql-web/         # browser UI: axum REST server + embedded SPA
python/
  src/lib.rs        # PyO3 bindings (Rust → Python)
  mdql/             # Python wrapper package (thin layer over Rust)
tests/              # Python test suite
examples/           # example data (strategies, backtests)
```

## License

AGPL-3.0. Commercial licenses available — see [LICENSE.md](LICENSE.md).