schema-risk 0.1.0

# SchemaRisk v2 — Architecture


> **Status**: Production-ready v2.0.0
> **Language**: Rust 2021 · **MSRV**: stable ≥ 1.75

---

## Module map


```
src/
├── main.rs          CLI entry point — Clap subcommands, exit codes
├── engine.rs        RiskEngine: analyze() + evaluate() + scoring table
├── types.rs         Core types: RiskLevel, DetectedOperation, MigrationReport,
│                    GuardDecision, GuardAuditLog, ActorKind
├── parser.rs        SQL → ParsedStatement (sqlparser-rs, PostgreSQL dialect)
│                    B-01: belt-and-suspenders UNSAFE_KEYWORDS grep
├── config.rs        schema-risk.yml loader (serde_yaml), defaults
├── guard.rs         DangerGuard: actor detection, impact panel, confirmation,
│                    audit log, dry-run, agent JSON output
├── sarif.rs         SARIF 2.1.0 formatter (serde_json, rules SR001-SR999)
├── locks.rs         LockSimulator: AccessExclusive / RowExclusive timeline
├── db.rs            LiveSchema: async connector (feature = "db")
├── graph.rs         SchemaGraph: petgraph DiGraph tables/FKs/indexes
├── impact.rs        ImpactScanner: rayon parallel file-tree walk
│                    B-03: skips identifiers < 4 chars by default
├── drift.rs         DriftReport: live schema ↔ disk migration delta
├── recommendation.rs Rule engine R01–R08, apply_fixes()
│                    B-04: transaction-aware (no CONCURRENTLY inside BEGIN)
├── ci.rs            GitHub/GitLab Markdown + JSON CI formatter
├── output.rs        Terminal pretty-printer (comfy-table)
│                    B-05: Unicode detection → UTF8_FULL vs ASCII preset
├── error.rs         SchemaRiskError (thiserror)
└── loader.rs        load_file() / load_glob() helpers
```

---

## End-to-end data flow


### `analyze` (offline)


```
load_file(path)
  └─► parser::parse(sql)         → Vec<ParsedStatement>
        └─► RiskEngine::analyze()
              ├─► evaluate()     → Vec<DetectedOperation>  (score per stmt)
              ├─► build_recommendations()                  → Vec<String>
              ├─► fk_impact_scan()                         → Vec<FkImpact>
              └─► → MigrationReport
                    └─► output::render()                   → terminal / JSON / SARIF
```

### `guard`


```
load_file(path)
  └─► parser::parse()
        └─► RiskEngine::analyze() → MigrationReport
              └─► guard::run_guard()
                    ├─► detect_actor()           → Human | Ci | Agent
                    ├─► is_guarded_operation()   → score≥40 or destructive keyword
                    ├─► render_impact_panel()    → stderr (per op)
                    ├─► prompt_confirmation()    → stdin ("yes i am sure")
                    └─► write_audit_log()        → .schemarisk-audit.json
```

### `ci-report`


```
for each file:
  load_file → parser::parse → RiskEngine::analyze → recommendation::suggest_fixes
optional:
  ImpactScanner::scan(root_dir)   → parallel rayon walk
output:
  ci::render_ci_report()          → GitHub/GitLab Markdown or JSON
exit:
  max_risk.exit_code(fail_level)  → 0 / 1 / 2
```

---

## Scoring algorithm


Each `ParsedStatement` maps to a base score + `RiskLevel`:

| Operation | Score | Lock | RiskLevel |
|-----------|-------|------|-----------|
| `DropTable` | 100 | ACCESS EXCLUSIVE | Critical |
| `TruncateTable` | 90 | ACCESS EXCLUSIVE | Critical |
| `DropColumn` (≥100k rows) | 85 | ACCESS EXCLUSIVE | Critical |
| `AlterColumnType` | 80 | ACCESS EXCLUSIVE | High |
| `SetNotNull` | 75 | ACCESS EXCLUSIVE | High |
| `AddColumnNoDefault` | 70 | ACCESS EXCLUSIVE | High |
| `CreateIndex { concurrent: false }` | 60 | SHARE | High |
| `AddForeignKey` | 50 | SHARE ROW EXCLUSIVE | Medium |
| `DropIndex` | 40 | ACCESS EXCLUSIVE | Medium |
| `RenameColumn / RenameTable` | 30 | ACCESS EXCLUSIVE | Medium |
| `CreateTable` | 5 | ACCESS EXCLUSIVE | Low |

**Table-size multiplier** (with `LiveSchema` or `--table-rows`):

| Rows | Multiplier |
|------|-----------|
| < 10k | ×1.0 |
| 10k–100k | ×1.25 |
| 100k–1M | ×1.5 |
| > 1M | ×2.0 |

`overall_risk` = highest `RiskLevel` across all operations.
`RiskLevel::from_score`: 0–29→Low, 30–59→Medium, 60–89→High, 90+→Critical.

---

## Guard module


### Trigger conditions


Guard fires when `score ≥ 40` **OR** description matches:
`DROP TABLE`, `DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE`, `DROP COLUMN`, `RENAME COLUMN/TABLE`

### Actor detection (priority order)


```rust
SCHEMARISK_ACTOR=agent              → Agent
ANTHROPIC_API_KEY or OPENAI_API_KEY → Agent
CI=true / GITHUB_ACTIONS / GITLAB_CI → CiPipeline
otherwise                            → Human
```

### Confirmation requirements


| Risk level | Required phrase |
|------------|----------------|
| Critical | `"yes i am sure"` (case-insensitive) |
| High | `"yes"` |

### Exit codes


| Code | Meaning |
|------|---------|
| 0 | Safe or all confirmed |
| 4 | Blocked (user declined or agent/CI blocked) |

---

## Bug fixes implemented


| ID | Module | Fix |
|----|--------|-----|
| B-01 | `parser.rs` | `UNSAFE_KEYWORDS` grep on `Other` stmts → score 30 in engine |
| B-02 | `engine.rs` | `--table-rows` flag wired to size multiplier in offline mode |
| B-03 | `impact.rs` | `ImpactScanner::new()` skips identifiers < 4 chars; `new_scan_short()` opts in |
| B-04 | `recommendation.rs` | `apply_fixes()` detects `BEGIN/COMMIT` and emits warning comment instead |
| B-05 | `output.rs` | `table_preset()` checks `$TERM`/platform → UTF8 or ASCII table border |

---

## Key type relationships


```
ParsedStatement  ──▶  DetectedOperation  (N per statement)
                              │
                     MigrationReport
                    ┌──────────┴──────────┐
              FixSuggestion          ImpactReport
           (recommendation)             (impact)
                                         │
                                   GuardDecision[]
                                         │
                                   GuardAuditLog
```

---

## SARIF rule mapping


| Rule ID | Trigger |
|---------|---------|
| SR001 | DROP TABLE |
| SR002 | ALTER COLUMN TYPE |
| SR003 | DROP COLUMN |
| SR004 | SET NOT NULL |
| SR005 | ADD COLUMN NOT NULL no default |
| SR006 | CREATE INDEX (no CONCURRENTLY) |
| SR007 | ADD FOREIGN KEY |
| SR008 | DROP INDEX |
| SR009 | RENAME COLUMN / TABLE |
| SR010 | TRUNCATE TABLE |
| SR999 | Unmodelled DDL (fallback) |

---

## Remaining work (backlog)


| ID | Module | Description | Effort |
|----|--------|-------------|--------|
| M-02 | `engine.rs` | Full size multiplier wired to `LiveSchema` row counts | ~2h |
| M-03 | `drift.rs` | Live-vs-disk reconciliation (currently stub) | ~6h |
| M-04 | `graph.rs` | Custom DOT export with table metadata | ~3h |
| I-04 | `db.rs` | Replace `tokio-postgres` with `sqlx` connection pool | ~4h |
| I-05 | `impact.rs` | AST-level query scanning (no regex false positives) | ~8h |
| I-07 | `ci.rs` | Parallel multi-file analysis with rayon | ~2h |
| M-05 | `main.rs` | `--watch` mode (notify crate) | ~4h |

---

## Security notes


- `--db-url` passwords are never logged — use `SCHEMA_RISK_DB_URL` env var in CI
- `ImpactScanner` uses `WalkDir::follow_links(false)` — no symlink traversal
- `LiveSchema` connector is read-only — no DDL/DML allowed
- Guard audit log is written with owner-only permissions on Unix
- CI Markdown output escapes triple-backtick sequences to prevent code block breakage