mdbook-validator 1.1.2

An mdBook preprocessor that validates code blocks using Docker containers
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
# mdbook-validator

[![Crates.io](https://img.shields.io/crates/v/mdbook-validator.svg)](https://crates.io/crates/mdbook-validator)
[![Documentation](https://docs.rs/mdbook-validator/badge.svg)](https://docs.rs/mdbook-validator)
[![CI](https://github.com/withzombies/mdbook-validator/actions/workflows/ci.yml/badge.svg)](https://github.com/withzombies/mdbook-validator/actions/workflows/ci.yml)
[![Coverage](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/withzombies/a0277ecc8a69526d47c694467b3bf9a4/raw/coverage.json)](https://github.com/withzombies/mdbook-validator/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](https://github.com/withzombies/mdbook-validator/blob/main/LICENSE)
[![Rust Version](https://img.shields.io/badge/rust-1.75%2B-blue.svg)](https://www.rust-lang.org)

An mdBook preprocessor that validates code examples against live Docker containers during documentation builds. Catch documentation drift before it reaches your users.

## The Problem

Documentation code examples rot:
- SQL queries reference tables that were renamed
- Config files have typos that were never tested
- Examples break when the tool updates
- Code runs but produces wrong output

You only find out when a user complains.

## The Solution

`mdbook-validator` validates your code examples against real tools during `mdbook build`. If an example doesn't work, your build fails—just like a broken test.

**Key insight**: Documentation examples often need setup code (CREATE TABLE, test data) or surrounding context (full config file) that readers don't need to see. This tool lets you include that context for validation while showing only the relevant portion to readers.

## Features

- **Container-based validation** - Run examples against real tools (osquery, SQLite, etc.)
- **Hidden setup blocks** - Include setup code that's validated but not shown to readers
- **Hidden context lines** - Show partial configs while validating complete ones (`@@` prefix)
- **Hidden code blocks** - Validate entire blocks without showing them to readers (`hidden` attribute)
- **Output assertions** - Verify row counts, check for specific content
- **Expected output matching** - Regression testing for deterministic queries
- **Clean output** - All validation markers stripped from rendered documentation

## Installation

```bash
# From crates.io (once published)
cargo install mdbook-validator

# From source
cargo install --git https://github.com/withzombies/mdbook-validator
```

**Requirements:**
- Docker running (containers provide validation environments)
- `jq` installed on host (used by validator scripts for JSON parsing)

## Quick Start

1. Add to your `book.toml`:

```toml
[preprocessor.validator]
command = "mdbook-validator"

[preprocessor.validator.validators.sqlite]
container = "keinos/sqlite3:3.47.2"
script = "validators/validate-sqlite.sh"
```

2. Write validated examples in your markdown:

````markdown
```sql validator=sqlite
<!--SETUP
sqlite3 /tmp/test.db "CREATE TABLE users (id INTEGER, name TEXT); INSERT INTO users VALUES (1, 'alice'), (2, 'bob');"
-->
SELECT name FROM users WHERE id = 1;
<!--ASSERT
rows = 1
contains "alice"
-->
```
````

3. Build your book:

```bash
mdbook build
```

**Reader sees:**
```sql
SELECT name FROM users WHERE id = 1;
```

**Validator tests:** Complete query with setup and assertions.

## Markers

### Block Markers

| Marker | Purpose | Runs? |
|--------|---------|-------|
| `<!--SETUP-->` | Shell commands to prepare state (create tables, trigger events, write files) | **Yes** - in container via `sh -c` |
| `<!--ASSERT-->` | Output validation rules (row counts, string matching) | No - passed to validator script |
| `<!--EXPECT-->` | Exact output matching for regression testing | No - passed to validator script |

### Line Prefix: `@@`

**Important:** `@@` does NOT execute anything. It only controls what readers see.

Lines starting with `@@` are:
- **Included** in content sent to container for validation
-**Hidden** from rendered documentation output

Use this to validate complete configs while showing only the relevant portion to readers.

````markdown
```toml validator=config-check
@@base_path = "/var/data"
@@log_level = "info"
@@
[feature]
enabled = true
max_items = 100
@@
@@[advanced]
@@timeout_secs = 30
```
````

**Reader sees:**
```toml
[feature]
enabled = true
max_items = 100
```

**Validator receives:** Complete, valid config.

## Examples

### SQLite with Setup

````markdown
```sql validator=sqlite
<!--SETUP
sqlite3 /tmp/test.db "CREATE TABLE orders (id INTEGER, total REAL, status TEXT); INSERT INTO orders VALUES (1, 99.99, 'shipped'), (2, 149.50, 'pending');"
-->
SELECT status, COUNT(*) as count FROM orders GROUP BY status;
<!--ASSERT
rows = 2
contains "shipped"
-->
```
````

### osquery (validates against real system)

````markdown
```sql validator=osquery
SELECT uid, username FROM users WHERE username = 'root'
<!--ASSERT
rows >= 1
contains "root"
-->
```
````

### osquery Config (JSON)

````markdown
```json validator=osquery-config
{
  "options": {
    "logger_path": "/var/log/osquery",
    "disable_events": false
  },
  "schedule": {
    "system_info": {
      "query": "SELECT * FROM system_info;",
      "interval": 3600
    }
  }
}
```
````

### Expected Output (Regression Testing)

````markdown
```sql validator=sqlite
<!--SETUP
sqlite3 /tmp/test.db "CREATE TABLE test (id INTEGER); INSERT INTO test VALUES (1), (2), (3);"
-->
SELECT COUNT(*) as total FROM test
<!--EXPECT
[{"total": 3}]
-->
```
````

### Bash Script Execution

Validate bash scripts run correctly and produce expected results:

````markdown
```bash validator=bash-exec
#!/bin/bash
echo "Hello from bash"
exit 0
```
````

Scripts must exit 0 by default. Use `exit_code` assertion for non-zero:

````markdown
```bash validator=bash-exec
exit 42
<!--ASSERT
exit_code = 42
-->
```
````

Check file creation and content:

````markdown
```bash validator=bash-exec
mkdir -p /tmp/myapp
echo "config=value" > /tmp/myapp/settings.conf
<!--ASSERT
dir_exists /tmp/myapp
file_exists /tmp/myapp/settings.conf
file_contains /tmp/myapp/settings.conf "config=value"
stdout_contains ""
-->
```
````

### Custom Container with Plugin (Advanced)

For validating custom osquery plugins or extensions, use a custom Docker image with SETUP to trigger events:

**1. Create Dockerfile with your plugin:**
```dockerfile
FROM osquery/osquery:5.17.0-ubuntu22.04
COPY my-plugin.ext /usr/local/lib/osquery/
RUN echo "/usr/local/lib/osquery/my-plugin.ext" >> /etc/osquery/extensions.load
```

**2. Configure in book.toml:**
```toml
[preprocessor.validator.validators.my-plugin]
container = "my-osquery-plugin:latest"
script = "validators/validate-osquery.sh"
```

**3. Write validated examples with SETUP:**
````markdown
```sql validator=my-plugin
<!--SETUP
# Trigger event that populates your plugin's table
curl -X POST http://localhost:8080/trigger-event
sleep 1
-->
SELECT * FROM my_plugin_events WHERE event_type = 'login';
<!--ASSERT
rows >= 1
contains "login"
-->
```
````

**Execution flow:**
1. Container starts with your plugin loaded
2. SETUP runs `curl` and `sleep` (in container, via `sh -c`)
3. Query runs against your plugin's table (in container)
4. JSON output goes to validator script (on host)
5. Assertions checked, pass/fail returned

### Skip Validation

````markdown
```sql validator=sqlite skip
-- This intentionally broken example shows what NOT to do
SELECT * FROM nonexistent_table;
```
````

### Hidden Blocks

Use `hidden` to validate a code block without showing it to readers. The entire code fence is removed from output.

````markdown
```sql validator=sqlite hidden
<!--SETUP
sqlite3 /tmp/test.db 'CREATE TABLE users (id INTEGER, name TEXT);'
-->
INSERT INTO users VALUES (1, 'alice'), (2, 'bob');
```

```sql validator=sqlite
SELECT name FROM users WHERE id = 1;
<!--ASSERT
rows = 1
contains "alice"
-->
```
````

**Reader sees only:**
```sql
SELECT name FROM users WHERE id = 1;
```

The hidden block populates data that the visible query depends on. Both are validated, but only the second appears in documentation.

**Use cases:**
- Setup queries that create test data for subsequent examples
- Teardown or cleanup blocks
- Validation-only examples that shouldn't appear in docs
- Multi-step workflows where only the final step matters to readers

**Note:** `hidden` and `skip` are mutually exclusive. Using both produces error E011.

## Assertions

### SQL Validators (osquery, sqlite)

| Assertion | Example | Description |
|-----------|---------|-------------|
| `rows = N` | `rows = 5` | Exact row count |
| `rows >= N` | `rows >= 1` | Minimum row count |
| `contains "str"` | `contains "alice"` | Output contains string |
| `matches "regex"` | `matches "user.*"` | Regex pattern match |

### Bash Execution (bash-exec)

| Assertion | Example | Description |
|-----------|---------|-------------|
| `exit_code = N` | `exit_code = 0` | Script must exit with code N (default: 0) |
| `stdout_contains "str"` | `stdout_contains "success"` | Stdout must contain string |
| `file_exists /path` | `file_exists /tmp/config` | File must exist after script |
| `dir_exists /path` | `dir_exists /tmp/mydir` | Directory must exist after script |
| `file_contains /path "str"` | `file_contains /tmp/cfg "key=val"` | File must contain string |

## Configuration

```toml
[book]
title = "My Documentation"

[preprocessor.validator]
command = "mdbook-validator"
fail-fast = true  # Stop on first failure (default: true)

# SQLite validator
[preprocessor.validator.validators.sqlite]
container = "keinos/sqlite3:3.47.2"
script = "validators/validate-sqlite.sh"

# osquery SQL validator
[preprocessor.validator.validators.osquery]
container = "osquery/osquery:5.17.0-ubuntu22.04"
script = "validators/validate-osquery.sh"

# osquery config validator (JSON, not TOML!)
[preprocessor.validator.validators.osquery-config]
container = "osquery/osquery:5.17.0-ubuntu22.04"
script = "validators/validate-osquery-config.sh"

# ShellCheck static analysis
[preprocessor.validator.validators.shellcheck]
container = "koalaman/shellcheck-alpine:stable"
script = "validators/validate-shellcheck.sh"

# Bash execution with assertions
[preprocessor.validator.validators.bash-exec]
container = "ubuntu:22.04"
script = "validators/validate-bash-exec.sh"

# Python syntax validation
[preprocessor.validator.validators.python]
container = "python:3.12-slim"
script = "validators/validate-python.sh"
```

## Custom Docker Images

You can use locally-built or private registry images without pushing to a public registry.

### Local Images

Build once, reference by name:

```bash
# Build your custom validator image
docker build -t my-validator:latest validators/myvalidator/
```

```toml
[preprocessor.validator.validators.custom]
container = "my-validator:latest"  # Local image, no registry needed
script = "validators/validate-custom.sh"
```

testcontainers-rs uses local images if they exist, no pulling required.

### Private Registry

For team sharing:

```bash
docker push registry.mycompany.com/my-validator:latest
```

```toml
[preprocessor.validator.validators.custom]
container = "registry.mycompany.com/my-validator:latest"
script = "validators/validate-custom.sh"
```

Docker uses your logged-in credentials (`docker login`).

### Example: pyproject.toml Validator

`validators/pyproject/Dockerfile`:
```dockerfile
FROM python:3.12-slim-bookworm
RUN pip install --no-cache-dir 'validate-pyproject[all]' jq
COPY validate.sh /validate.sh
RUN chmod +x /validate.sh
```

`validators/pyproject/validate.sh`:
```bash
#!/bin/bash
set -e
INPUT=$(cat)
CONTENT=$(echo "$INPUT" | jq -r '.content')
TMPFILE=$(mktemp --suffix=.toml)
echo "$CONTENT" > "$TMPFILE"
validate-pyproject "$TMPFILE"
```

Build and use:
```bash
docker build -t pyproject-validator:latest validators/pyproject/
```

```toml
[preprocessor.validator.validators.pyproject]
container = "pyproject-validator:latest"
script = "validators/validate-custom.sh"
```

## Writing Custom Validators

Validators are shell scripts that run on the **host** (not in containers). They receive:

- **stdin**: JSON output from the container execution (e.g., `[{"id": 1, "name": "test"}]`)
- **VALIDATOR_ASSERTIONS** env var: Assertion rules, newline-separated
- **VALIDATOR_EXPECT** env var: Expected output for exact matching (optional)
- **CONTAINER_STDERR** env var: stderr from container execution (for warning detection)

The preprocessor handles SETUP and query execution in the container—validators only validate the output.

Exit 0 for success, non-zero for failure. Write errors to stderr.

Example validator:

```bash
#!/bin/bash
set -e

# Read JSON output from container (stdin)
JSON_OUTPUT=$(cat)

# Validate JSON is parseable
echo "$JSON_OUTPUT" | jq empty 2>/dev/null || {
    echo "Invalid JSON output" >&2
    exit 1
}

# Check assertions if provided
if [ -n "${VALIDATOR_ASSERTIONS:-}" ]; then
    ROW_COUNT=$(echo "$JSON_OUTPUT" | jq 'length')

    # Example: check "rows >= N"
    if [[ "$VALIDATOR_ASSERTIONS" == *"rows >= "* ]]; then
        expected=$(echo "$VALIDATOR_ASSERTIONS" | grep -oP 'rows >= \K\d+')
        if [ "$ROW_COUNT" -lt "$expected" ]; then
            echo "Assertion failed: rows >= $expected (got $ROW_COUNT)" >&2
            exit 1
        fi
    fi
fi

# Check expected output if provided
if [ -n "${VALIDATOR_EXPECT:-}" ]; then
    actual=$(echo "$JSON_OUTPUT" | jq -c '.')
    expected=$(echo "$VALIDATOR_EXPECT" | jq -c '.')
    if [ "$actual" != "$expected" ]; then
        echo "Output mismatch: expected $expected, got $actual" >&2
        exit 1
    fi
fi

exit 0
```

See `validators/validate-template.sh` for a comprehensive template with all assertion patterns.

## Known Limitations

1. **Container startup overhead** - First validation takes 10-20 seconds per validator type
2. **No container reuse between builds** - Each `mdbook build` starts fresh containers
3. **Marker collision** - If your code contains `-->`, it may break marker parsing
4. **No line numbers in errors** - Error messages show file but not exact line

## Execution Model

Understanding where things run is critical for writing effective validations:

```
┌─────────────────────────────────────────────────────────────────────┐
│                           HOST MACHINE                              │
│                                                                     │
│  ┌──────────────────┐                      ┌─────────────────────┐  │
│  │  mdbook-validator │                      │  Validator Script   │  │
│  │  (preprocessor)   │                      │  (e.g., validate-   │  │
│  │                   │                      │   osquery.sh)       │  │
│  │  1. Parse markdown│                      │                     │  │
│  │  2. Extract blocks│                      │  7. Receive JSON    │  │
│  │  3. Start container                      │  8. Check assertions│  │
│  └────────┬──────────┘                      │  9. Exit 0 or fail  │  │
│           │                                 └──────────▲──────────┘  │
│           │                                            │             │
│           ▼                                            │             │
│  ┌────────────────────────────────────────────────────┼──────────┐  │
│  │                    DOCKER CONTAINER                 │          │  │
│  │                                                     │          │  │
│  │   4. Run SETUP via `sh -c "<setup content>"`        │          │  │
│  │      (CREATE TABLE, trigger events, etc.)           │          │  │
│  │                                                     │          │  │
│  │   5. Run main code via `exec_command`               │          │  │
│  │      (osqueryi --json, sqlite3 -json, etc.)    ─────┘          │  │
│  │                                                JSON stdout     │  │
│  │   6. Capture stdout → send to validator                        │  │
│  │                                                                │  │
│  └────────────────────────────────────────────────────────────────┘  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

### What Runs Where

| Component | Runs In | Purpose |
|-----------|---------|---------|
| `<!--SETUP-->` content | **Container** via `sh -c` | Prepare state (create tables, trigger events, write files) |
| Main code block | **Container** via `exec_command` | Execute the query/script being documented |
| Validator script | **Host** | Validate the JSON output from container |
| `jq` (for JSON parsing) | **Host** | Used by validator scripts |

### Execution Order

1. **SETUP** (if present) → Runs first, in container, via `sh -c "<setup content>"`
2. **Main code** → Runs second, in container, via configured `exec_command`
3. **Validator** → Runs last, on host, receives container's stdout

### Common Confusion: `@@` vs `<!--SETUP-->`

These serve **completely different purposes**:

| Feature | `@@` prefix | `<!--SETUP-->` |
|---------|-------------|----------------|
| Purpose | **Hide lines** from rendered output | **Execute commands** before main code |
| Runs? | No - it's just content filtering | Yes - runs in container via `sh -c` |
| Use case | Show partial config, validate full config | Create tables, trigger events, prepare state |

**Example - `@@` hides context lines:**
````markdown
```json validator=osquery-config
@@{
@@  "options": { "disable_events": false },
@@  "schedule": {
    "my_query": {
      "query": "SELECT * FROM processes;",
      "interval": 60
    }
@@  }
@@}
```
````
Reader sees only `my_query` section. Validator receives complete JSON.

**Example - `<!--SETUP-->` prepares state:**
````markdown
```sql validator=osquery
<!--SETUP
touch /tmp/test-file.txt
-->
SELECT * FROM file WHERE path = '/tmp/test-file.txt';
<!--ASSERT
rows >= 1
-->
```
````
SETUP creates the file. Query runs after. Validator checks the result.

## How It Works

1. mdBook calls the preprocessor with chapter content
2. Preprocessor finds code blocks with `validator=` attribute
3. Extracts markers (`<!--SETUP-->`, `<!--ASSERT-->`, `<!--EXPECT-->`) and `@@` lines
4. Starts the specified container via testcontainers
5. Runs SETUP content in container via `sh -c` (if present)
6. Runs the visible content (plus `@@` lines) via `exec_command` in container
7. Captures container stdout (JSON) and stderr
8. Runs validator script **on host** with:
   - stdin: JSON output from container
   - `VALIDATOR_ASSERTIONS`: assertion rules
   - `VALIDATOR_EXPECT`: expected output
   - `VALIDATOR_CONTAINER_STDERR`: container stderr
9. On success: strips all markers and `@@` lines, returns clean content to mdBook
10. On failure: exits with error, build fails

## License

Apache2

## Contributing

Contributions welcome! Please open an issue to discuss before submitting large changes.