# SchemaRisk — Architecture
> **Phase 2 — PostgreSQL Version Intelligence + Safe Migration Generator**
> **Language**: Rust 2021 · **MSRV**: stable ≥ 1.75
---
## Module map
```
src/
├── main.rs CLI entry point — Clap subcommands, exit codes
├── engine.rs RiskEngine: analyze() + evaluate() + scoring table
│ Phase 2: pg_version: u32 field, with_pg_version() builder
│ Version-aware rules for ADD COLUMN / SET NOT NULL / ALTER TYPE
├── types.rs Core types: RiskLevel, DetectedOperation, MigrationReport,
│ GuardDecision, GuardAuditLog, ActorKind
│ Phase 2: MigrationReport.pg_version: u32
├── parser.rs SQL → ParsedStatement (sqlparser-rs, PostgreSQL dialect)
│ B-01: belt-and-suspenders UNSAFE_KEYWORDS grep
├── config.rs schema-risk.yml loader (serde_yaml), defaults
├── guard.rs DangerGuard: actor detection, impact panel, confirmation,
│ audit log, dry-run, agent JSON output
├── sarif.rs SARIF 2.1.0 formatter (serde_json, rules SR001-SR999)
├── locks.rs LockSimulator: AccessExclusive / RowExclusive timeline
├── db.rs LiveSchema: async connector (feature = "db")
├── graph.rs SchemaGraph: petgraph DiGraph tables/FKs/indexes
├── impact.rs ImpactScanner: rayon parallel file-tree walk
│ B-03: skips identifiers < 4 chars by default
├── drift.rs DriftReport: live schema ↔ disk migration delta
├── recommendation.rs Rule engine R01–R09, apply_fixes(), suggest_fixes()
│ B-04: transaction-aware (no CONCURRENTLY inside BEGIN)
│ Phase 2: R09 -- ADD COLUMN WITH DEFAULT PG version guidance
├── ci.rs GitHub/GitLab Markdown + JSON CI formatter
│ Phase 2: PG version in title, per-file note, viral footer
├── output.rs Terminal pretty-printer (comfy-table)
│ B-05: Unicode detection → UTF8_FULL vs ASCII preset
├── error.rs SchemaRiskError (thiserror)
└── loader.rs load_file() / load_glob() helpers
```
---
## End-to-end data flow
### `analyze` (offline)
```
load_file(path)
└─► parser::parse(sql) → Vec<ParsedStatement>
└─► RiskEngine::analyze()
├─► evaluate() → Vec<DetectedOperation> (score per stmt)
├─► build_recommendations() → Vec<String>
├─► fk_impact_scan() → Vec<FkImpact>
└─► → MigrationReport
└─► output::render() → terminal / JSON / SARIF
```
### `guard`
```
load_file(path)
└─► parser::parse()
└─► RiskEngine::analyze() → MigrationReport
└─► guard::run_guard()
├─► detect_actor() → Human | Ci | Agent
├─► is_guarded_operation() → score≥40 or destructive keyword
├─► render_impact_panel() → stderr (per op)
├─► prompt_confirmation() → stdin ("yes i am sure")
└─► write_audit_log() → .schemarisk-audit.json
```
### `ci-report`
```
for each file:
load_file → parser::parse → RiskEngine::analyze → recommendation::suggest_fixes
optional:
ImpactScanner::scan(root_dir) → parallel rayon walk
output:
ci::render_ci_report() → GitHub/GitLab Markdown or JSON
exit:
max_risk.exit_code(fail_level) → 0 / 1 / 2
```
---
## Scoring algorithm
Each `ParsedStatement` maps to a base score + `RiskLevel`:
| `DropTable` | 100 | ACCESS EXCLUSIVE | Critical | — |
| `TruncateTable` | 90 | ACCESS EXCLUSIVE | Critical | — |
| `DropColumn` (≥100k rows) | 85 | ACCESS EXCLUSIVE | Critical | — |
| `AlterColumnType` | 80 | ACCESS EXCLUSIVE | High | All versions — use shadow-column pattern |
| `SetNotNull` (PG < 12) | 55/25 | ACCESS EXCLUSIVE | High/Medium | Full scan |
| `SetNotNull` (PG ≥ 12) | 40/15 | ACCESS EXCLUSIVE | Medium/Low | Suggest CHECK NOT VALID |
| `AddColumnNotNull NoDefault` | 70 | ACCESS EXCLUSIVE | High | All versions — fails on non-empty table |
| `AddColumnWithDefault` (PG < 11) | 80/45 | ACCESS EXCLUSIVE | High | Full table rewrite |
| `AddColumnWithDefault` (PG ≥ 11) | 5 | None | Low | Metadata-only, no lock |
| `CreateIndex { concurrent: false }` | 60 | SHARE | High | Use CONCURRENTLY |
| `AddForeignKey` | 50 | SHARE ROW EXCLUSIVE | Medium | — |
| `DropIndex` | 40 | ACCESS EXCLUSIVE | Medium | — |
| `RenameColumn / RenameTable` | 30 | ACCESS EXCLUSIVE | Medium | — |
| `CreateTable` | 5 | ACCESS EXCLUSIVE | Low | — |
**Table-size multiplier** (with `LiveSchema` or `--table-rows`):
| < 10k | ×1.0 |
| 10k–100k | ×1.25 |
| 100k–1M | ×1.5 |
| > 1M | ×2.0 |
`overall_risk` = highest `RiskLevel` across all operations.
`RiskLevel::from_score`: 0–29→Low, 30–59→Medium, 60–89→High, 90+→Critical.
---
## Guard module
### Trigger conditions
Guard fires when `score ≥ 40` **OR** description matches:
`DROP TABLE`, `DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE`, `DROP COLUMN`, `RENAME COLUMN/TABLE`
### Actor detection (priority order)
```rust
SCHEMARISK_ACTOR=agent → Agent
ANTHROPIC_API_KEY or OPENAI_API_KEY → Agent
CI=true / GITHUB_ACTIONS / GITLAB_CI → CiPipeline
otherwise → Human
```
### Confirmation requirements
| Critical | `"yes i am sure"` (case-insensitive) |
| High | `"yes"` |
### Exit codes
| 0 | Safe or all confirmed |
| 4 | Blocked (user declined or agent/CI blocked) |
---
## Bug fixes implemented
| B-01 | `parser.rs` | `UNSAFE_KEYWORDS` grep on `Other` stmts → score 30 in engine |
| B-02 | `engine.rs` | `--table-rows` flag wired to size multiplier in offline mode |
| B-03 | `impact.rs` | `ImpactScanner::new()` skips identifiers < 4 chars; `new_scan_short()` opts in |
| B-04 | `recommendation.rs` | `apply_fixes()` detects `BEGIN/COMMIT` and emits warning comment instead |
| B-05 | `output.rs` | `table_preset()` checks `$TERM`/platform → UTF8 or ASCII table border |
---
## Key type relationships
```
ParsedStatement ──▶ DetectedOperation (N per statement)
│
MigrationReport
┌──────────┴──────────┐
FixSuggestion ImpactReport
(recommendation) (impact)
│
GuardDecision[]
│
GuardAuditLog
```
---
## SARIF rule mapping
| SR001 | DROP TABLE |
| SR002 | ALTER COLUMN TYPE |
| SR003 | DROP COLUMN |
| SR004 | SET NOT NULL |
| SR005 | ADD COLUMN NOT NULL no default |
| SR006 | CREATE INDEX (no CONCURRENTLY) |
| SR007 | ADD FOREIGN KEY |
| SR008 | DROP INDEX |
| SR009 | RENAME COLUMN / TABLE |
| SR010 | TRUNCATE TABLE |
| SR999 | Unmodelled DDL (fallback) |
---
## Phase 2 additions
### `--pg-version` flag
Both `analyze` and `ci-report` now accept `--pg-version <UINT>` (default: 14).
The value is passed from CLI → `RiskEngine::with_pg_version()` → stored as `self.pg_version: u32`.
It is forwarded to `MigrationReport.pg_version` so the CI renderer can display it.
### Version-aware rule logic (`engine.rs`)
```
AlterTableAddColumn:
if !nullable && !has_default → HIGH risk (fails on non-empty tables)
elif has_default && pg_version < 11 → HIGH risk (full table rewrite)
elif has_default && pg_version >= 11 → LOW risk (metadata-only, PG11 fast path)
else (nullable, no default) → LOW risk
AlterTableSetNotNull:
if pg_version >= 12 → suggest CHECK CONSTRAINT NOT VALID / VALIDATE approach
score: 40 (table scan) or 15 (index-covered)
else → score: 55 (table scan) or 25 (index-covered)
AlterTableAlterColumnType:
All versions → HIGH (full rewrite)
Warning message now includes 4-step zero-downtime shadow-column plan
```
### R09 rule (`recommendation.rs`)
Rule R09 fires when an `ADD COLUMN` has `column.has_default = true`.
| Rule ID | `R09` |
| Severity | `Info` |
| Title | `ADD COLUMN '<name>' WITH DEFAULT: safe on PG11+, table-rewrite on PG10 and below` |
| Docs URL | PostgreSQL 11 release notes |
| Migration steps | PG10 3-step: add nullable, backfill in batches, set NOT NULL |
### CI report improvements (`ci.rs`)
- Title includes PG version: `SchemaRisk — Migration Safety Report *(PostgreSQL 14)*`
- Per-file sub-line: `Analyzed against PostgreSQL {n} rules. Use --pg-version to change target version.`
- Footer rewritten as viral attribution linking to the GitHub repository
### `--format markdown` fix (`main.rs`)
Previously `OutputFormat::Markdown | GithubComment | GitlabComment` fell through to the terminal renderer.
Now explicitly routed through `ci::render_ci_report()` for proper markdown output.
---
## Remaining work (backlog)
| M-02 | `engine.rs` | Full size multiplier wired to `LiveSchema` row counts | ~2h |
| M-02 | `engine.rs` | Full size multiplier wired to `LiveSchema` row counts | ~2h |
| M-03 | `drift.rs` | Live-vs-disk reconciliation (currently stub) | ~6h |
| M-04 | `graph.rs` | Custom DOT export with table metadata | ~3h |
| I-04 | `db.rs` | Replace `tokio-postgres` with `sqlx` connection pool | ~4h |
| I-05 | `impact.rs` | AST-level query scanning (no regex false positives) | ~8h |
| I-07 | `ci.rs` | Parallel multi-file analysis with rayon | ~2h |
| M-05 | `main.rs` | `--watch` mode (notify crate) | ~4h |
---
## Security notes
- `--db-url` passwords are never logged — use `SCHEMA_RISK_DB_URL` env var in CI
- `ImpactScanner` uses `WalkDir::follow_links(false)` — no symlink traversal
- `LiveSchema` connector is read-only — no DDL/DML allowed
- Guard audit log is written with owner-only permissions on Unix
- CI Markdown output escapes triple-backtick sequences to prevent code block breakage