schema-risk 0.1.3

Stop dangerous database migrations before they reach production
Documentation
# SchemaRisk — Architecture


> **Phase 2 — PostgreSQL Version Intelligence + Safe Migration Generator**
> **Language**: Rust 2021 · **MSRV**: stable ≥ 1.75

---

## Module map


```
src/
├── main.rs          CLI entry point — Clap subcommands, exit codes
├── engine.rs        RiskEngine: analyze() + evaluate() + scoring table
│                    Phase 2: pg_version: u32 field, with_pg_version() builder
│                    Version-aware rules for ADD COLUMN / SET NOT NULL / ALTER TYPE
├── types.rs         Core types: RiskLevel, DetectedOperation, MigrationReport,
│                    GuardDecision, GuardAuditLog, ActorKind
│                    Phase 2: MigrationReport.pg_version: u32
├── parser.rs        SQL → ParsedStatement (sqlparser-rs, PostgreSQL dialect)
│                    B-01: belt-and-suspenders UNSAFE_KEYWORDS grep
├── config.rs        schema-risk.yml loader (serde_yaml), defaults
├── guard.rs         DangerGuard: actor detection, impact panel, confirmation,
│                    audit log, dry-run, agent JSON output
├── sarif.rs         SARIF 2.1.0 formatter (serde_json, rules SR001-SR999)
├── locks.rs         LockSimulator: AccessExclusive / RowExclusive timeline
├── db.rs            LiveSchema: async connector (feature = "db")
├── graph.rs         SchemaGraph: petgraph DiGraph tables/FKs/indexes
├── impact.rs        ImpactScanner: rayon parallel file-tree walk
│                    B-03: skips identifiers < 4 chars by default
├── drift.rs         DriftReport: live schema ↔ disk migration delta
├── recommendation.rs Rule engine R01–R09, apply_fixes(), suggest_fixes()
│                    B-04: transaction-aware (no CONCURRENTLY inside BEGIN)
│                    Phase 2: R09 -- ADD COLUMN WITH DEFAULT PG version guidance
├── ci.rs            GitHub/GitLab Markdown + JSON CI formatter
│                    Phase 2: PG version in title, per-file note, viral footer
├── output.rs        Terminal pretty-printer (comfy-table)
│                    B-05: Unicode detection → UTF8_FULL vs ASCII preset
├── error.rs         SchemaRiskError (thiserror)
└── loader.rs        load_file() / load_glob() helpers
```

---

## End-to-end data flow


### `analyze` (offline)


```
load_file(path)
  └─► parser::parse(sql)         → Vec<ParsedStatement>
        └─► RiskEngine::analyze()
              ├─► evaluate()     → Vec<DetectedOperation>  (score per stmt)
              ├─► build_recommendations()                  → Vec<String>
              ├─► fk_impact_scan()                         → Vec<FkImpact>
              └─► → MigrationReport
                    └─► output::render()                   → terminal / JSON / SARIF
```

### `guard`


```
load_file(path)
  └─► parser::parse()
        └─► RiskEngine::analyze() → MigrationReport
              └─► guard::run_guard()
                    ├─► detect_actor()           → Human | Ci | Agent
                    ├─► is_guarded_operation()   → score≥40 or destructive keyword
                    ├─► render_impact_panel()    → stderr (per op)
                    ├─► prompt_confirmation()    → stdin ("yes i am sure")
                    └─► write_audit_log()        → .schemarisk-audit.json
```

### `ci-report`


```
for each file:
  load_file → parser::parse → RiskEngine::analyze → recommendation::suggest_fixes
optional:
  ImpactScanner::scan(root_dir)   → parallel rayon walk
output:
  ci::render_ci_report()          → GitHub/GitLab Markdown or JSON
exit:
  max_risk.exit_code(fail_level)  → 0 / 1 / 2
```

---

## Scoring algorithm


Each `ParsedStatement` maps to a base score + `RiskLevel`:

| Operation | Score | Lock | RiskLevel | PG Version Note |
|-----------|------:|------|-----------|-----------------|
| `DropTable` | 100 | ACCESS EXCLUSIVE | Critical ||
| `TruncateTable` | 90 | ACCESS EXCLUSIVE | Critical ||
| `DropColumn` (≥100k rows) | 85 | ACCESS EXCLUSIVE | Critical ||
| `AlterColumnType` | 80 | ACCESS EXCLUSIVE | High | All versions — use shadow-column pattern |
| `SetNotNull` (PG < 12) | 55/25 | ACCESS EXCLUSIVE | High/Medium | Full scan |
| `SetNotNull` (PG ≥ 12) | 40/15 | ACCESS EXCLUSIVE | Medium/Low | Suggest CHECK NOT VALID |
| `AddColumnNotNull NoDefault` | 70 | ACCESS EXCLUSIVE | High | All versions — fails on non-empty table |
| `AddColumnWithDefault` (PG < 11) | 80/45 | ACCESS EXCLUSIVE | High | Full table rewrite |
| `AddColumnWithDefault` (PG ≥ 11) | 5 | None | Low | Metadata-only, no lock |
| `CreateIndex { concurrent: false }` | 60 | SHARE | High | Use CONCURRENTLY |
| `AddForeignKey` | 50 | SHARE ROW EXCLUSIVE | Medium ||
| `DropIndex` | 40 | ACCESS EXCLUSIVE | Medium ||
| `RenameColumn / RenameTable` | 30 | ACCESS EXCLUSIVE | Medium ||
| `CreateTable` | 5 | ACCESS EXCLUSIVE | Low ||

**Table-size multiplier** (with `LiveSchema` or `--table-rows`):

| Rows | Multiplier |
|------|-----------|
| < 10k | ×1.0 |
| 10k–100k | ×1.25 |
| 100k–1M | ×1.5 |
| > 1M | ×2.0 |

`overall_risk` = highest `RiskLevel` across all operations.
`RiskLevel::from_score`: 0–29→Low, 30–59→Medium, 60–89→High, 90+→Critical.

---

## Guard module


### Trigger conditions


Guard fires when `score ≥ 40` **OR** description matches:
`DROP TABLE`, `DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE`, `DROP COLUMN`, `RENAME COLUMN/TABLE`

### Actor detection (priority order)


```rust
SCHEMARISK_ACTOR=agent              → Agent
ANTHROPIC_API_KEY or OPENAI_API_KEY → Agent
CI=true / GITHUB_ACTIONS / GITLAB_CI → CiPipeline
otherwise                            → Human
```

### Confirmation requirements


| Risk level | Required phrase |
|------------|----------------|
| Critical | `"yes i am sure"` (case-insensitive) |
| High | `"yes"` |

### Exit codes


| Code | Meaning |
|------|---------|
| 0 | Safe or all confirmed |
| 4 | Blocked (user declined or agent/CI blocked) |

---

## Bug fixes implemented


| ID | Module | Fix |
|----|--------|-----|
| B-01 | `parser.rs` | `UNSAFE_KEYWORDS` grep on `Other` stmts → score 30 in engine |
| B-02 | `engine.rs` | `--table-rows` flag wired to size multiplier in offline mode |
| B-03 | `impact.rs` | `ImpactScanner::new()` skips identifiers < 4 chars; `new_scan_short()` opts in |
| B-04 | `recommendation.rs` | `apply_fixes()` detects `BEGIN/COMMIT` and emits warning comment instead |
| B-05 | `output.rs` | `table_preset()` checks `$TERM`/platform → UTF8 or ASCII table border |

---

## Key type relationships


```
ParsedStatement  ──▶  DetectedOperation  (N per statement)
                     MigrationReport
                    ┌──────────┴──────────┐
              FixSuggestion          ImpactReport
           (recommendation)             (impact)
                                   GuardDecision[]
                                   GuardAuditLog
```

---

## SARIF rule mapping


| Rule ID | Trigger |
|---------|---------|
| SR001 | DROP TABLE |
| SR002 | ALTER COLUMN TYPE |
| SR003 | DROP COLUMN |
| SR004 | SET NOT NULL |
| SR005 | ADD COLUMN NOT NULL no default |
| SR006 | CREATE INDEX (no CONCURRENTLY) |
| SR007 | ADD FOREIGN KEY |
| SR008 | DROP INDEX |
| SR009 | RENAME COLUMN / TABLE |
| SR010 | TRUNCATE TABLE |
| SR999 | Unmodelled DDL (fallback) |

---

## Phase 2 additions


### `--pg-version` flag


Both `analyze` and `ci-report` now accept `--pg-version <UINT>` (default: 14).
The value is passed from CLI → `RiskEngine::with_pg_version()` → stored as `self.pg_version: u32`.
It is forwarded to `MigrationReport.pg_version` so the CI renderer can display it.

### Version-aware rule logic (`engine.rs`)


```
AlterTableAddColumn:
  if !nullable && !has_default  → HIGH risk (fails on non-empty tables)
  elif has_default && pg_version < 11 → HIGH risk (full table rewrite)
  elif has_default && pg_version >= 11 → LOW risk (metadata-only, PG11 fast path)
  else (nullable, no default) → LOW risk

AlterTableSetNotNull:
  if pg_version >= 12 → suggest CHECK CONSTRAINT NOT VALID / VALIDATE approach
                        score: 40 (table scan) or 15 (index-covered)
  else               → score: 55 (table scan) or 25 (index-covered)

AlterTableAlterColumnType:
  All versions → HIGH (full rewrite)
  Warning message now includes 4-step zero-downtime shadow-column plan
```

### R09 rule (`recommendation.rs`)


Rule R09 fires when an `ADD COLUMN` has `column.has_default = true`.

| Attribute | Value |
|---|---|
| Rule ID | `R09` |
| Severity | `Info` |
| Title | `ADD COLUMN '<name>' WITH DEFAULT: safe on PG11+, table-rewrite on PG10 and below` |
| Docs URL | PostgreSQL 11 release notes |
| Migration steps | PG10 3-step: add nullable, backfill in batches, set NOT NULL |

### CI report improvements (`ci.rs`)


- Title includes PG version: `SchemaRisk — Migration Safety Report *(PostgreSQL 14)*`
- Per-file sub-line: `Analyzed against PostgreSQL {n} rules. Use --pg-version to change target version.`
- Footer rewritten as viral attribution linking to the GitHub repository

### `--format markdown` fix (`main.rs`)


Previously `OutputFormat::Markdown | GithubComment | GitlabComment` fell through to the terminal renderer.
Now explicitly routed through `ci::render_ci_report()` for proper markdown output.

---

## Remaining work (backlog)

| ID | Module | Description | Effort |
|----|--------|-------------|--------|
| M-02 | `engine.rs` | Full size multiplier wired to `LiveSchema` row counts | ~2h |
| M-02 | `engine.rs` | Full size multiplier wired to `LiveSchema` row counts | ~2h |
| M-03 | `drift.rs` | Live-vs-disk reconciliation (currently stub) | ~6h |
| M-04 | `graph.rs` | Custom DOT export with table metadata | ~3h |
| I-04 | `db.rs` | Replace `tokio-postgres` with `sqlx` connection pool | ~4h |
| I-05 | `impact.rs` | AST-level query scanning (no regex false positives) | ~8h |
| I-07 | `ci.rs` | Parallel multi-file analysis with rayon | ~2h |
| M-05 | `main.rs` | `--watch` mode (notify crate) | ~4h |

---

## Security notes


- `--db-url` passwords are never logged — use `SCHEMA_RISK_DB_URL` env var in CI
- `ImpactScanner` uses `WalkDir::follow_links(false)` — no symlink traversal
- `LiveSchema` connector is read-only — no DDL/DML allowed
- Guard audit log is written with owner-only permissions on Unix
- CI Markdown output escapes triple-backtick sequences to prevent code block breakage