# SchemaRisk v2 — Architecture
> **Status**: Production-ready v2.0.0
> **Language**: Rust 2021 · **MSRV**: stable ≥ 1.75
---
## Module map
```
src/
├── main.rs CLI entry point — Clap subcommands, exit codes
├── engine.rs RiskEngine: analyze() + evaluate() + scoring table
├── types.rs Core types: RiskLevel, DetectedOperation, MigrationReport,
│ GuardDecision, GuardAuditLog, ActorKind
├── parser.rs SQL → ParsedStatement (sqlparser-rs, PostgreSQL dialect)
│ B-01: belt-and-suspenders UNSAFE_KEYWORDS grep
├── config.rs schema-risk.yml loader (serde_yaml), defaults
├── guard.rs DangerGuard: actor detection, impact panel, confirmation,
│ audit log, dry-run, agent JSON output
├── sarif.rs SARIF 2.1.0 formatter (serde_json, rules SR001-SR999)
├── locks.rs LockSimulator: AccessExclusive / RowExclusive timeline
├── db.rs LiveSchema: async connector (feature = "db")
├── graph.rs SchemaGraph: petgraph DiGraph tables/FKs/indexes
├── impact.rs ImpactScanner: rayon parallel file-tree walk
│ B-03: skips identifiers < 4 chars by default
├── drift.rs DriftReport: live schema ↔ disk migration delta
├── recommendation.rs Rule engine R01–R08, apply_fixes()
│ B-04: transaction-aware (no CONCURRENTLY inside BEGIN)
├── ci.rs GitHub/GitLab Markdown + JSON CI formatter
├── output.rs Terminal pretty-printer (comfy-table)
│ B-05: Unicode detection → UTF8_FULL vs ASCII preset
├── error.rs SchemaRiskError (thiserror)
└── loader.rs load_file() / load_glob() helpers
```
---
## End-to-end data flow
### `analyze` (offline)
```
load_file(path)
└─► parser::parse(sql) → Vec<ParsedStatement>
└─► RiskEngine::analyze()
├─► evaluate() → Vec<DetectedOperation> (score per stmt)
├─► build_recommendations() → Vec<String>
├─► fk_impact_scan() → Vec<FkImpact>
└─► → MigrationReport
└─► output::render() → terminal / JSON / SARIF
```
### `guard`
```
load_file(path)
└─► parser::parse()
└─► RiskEngine::analyze() → MigrationReport
└─► guard::run_guard()
├─► detect_actor() → Human | Ci | Agent
├─► is_guarded_operation() → score≥40 or destructive keyword
├─► render_impact_panel() → stderr (per op)
├─► prompt_confirmation() → stdin ("yes i am sure")
└─► write_audit_log() → .schemarisk-audit.json
```
### `ci-report`
```
for each file:
load_file → parser::parse → RiskEngine::analyze → recommendation::suggest_fixes
optional:
ImpactScanner::scan(root_dir) → parallel rayon walk
output:
ci::render_ci_report() → GitHub/GitLab Markdown or JSON
exit:
max_risk.exit_code(fail_level) → 0 / 1 / 2
```
---
## Scoring algorithm
Each `ParsedStatement` maps to a base score + `RiskLevel`:
| `DropTable` | 100 | ACCESS EXCLUSIVE | Critical |
| `TruncateTable` | 90 | ACCESS EXCLUSIVE | Critical |
| `DropColumn` (≥100k rows) | 85 | ACCESS EXCLUSIVE | Critical |
| `AlterColumnType` | 80 | ACCESS EXCLUSIVE | High |
| `SetNotNull` | 75 | ACCESS EXCLUSIVE | High |
| `AddColumnNoDefault` | 70 | ACCESS EXCLUSIVE | High |
| `CreateIndex { concurrent: false }` | 60 | SHARE | High |
| `AddForeignKey` | 50 | SHARE ROW EXCLUSIVE | Medium |
| `DropIndex` | 40 | ACCESS EXCLUSIVE | Medium |
| `RenameColumn / RenameTable` | 30 | ACCESS EXCLUSIVE | Medium |
| `CreateTable` | 5 | ACCESS EXCLUSIVE | Low |
**Table-size multiplier** (with `LiveSchema` or `--table-rows`):
| < 10k | ×1.0 |
| 10k–100k | ×1.25 |
| 100k–1M | ×1.5 |
| > 1M | ×2.0 |
`overall_risk` = highest `RiskLevel` across all operations.
`RiskLevel::from_score`: 0–29→Low, 30–59→Medium, 60–89→High, 90+→Critical.
---
## Guard module
### Trigger conditions
Guard fires when `score ≥ 40` **OR** description matches:
`DROP TABLE`, `DROP DATABASE`, `DROP SCHEMA`, `TRUNCATE`, `DROP COLUMN`, `RENAME COLUMN/TABLE`
### Actor detection (priority order)
```rust
SCHEMARISK_ACTOR=agent → Agent
ANTHROPIC_API_KEY or OPENAI_API_KEY → Agent
CI=true / GITHUB_ACTIONS / GITLAB_CI → CiPipeline
otherwise → Human
```
### Confirmation requirements
| Critical | `"yes i am sure"` (case-insensitive) |
| High | `"yes"` |
### Exit codes
| 0 | Safe or all confirmed |
| 4 | Blocked (user declined or agent/CI blocked) |
---
## Bug fixes implemented
| B-01 | `parser.rs` | `UNSAFE_KEYWORDS` grep on `Other` stmts → score 30 in engine |
| B-02 | `engine.rs` | `--table-rows` flag wired to size multiplier in offline mode |
| B-03 | `impact.rs` | `ImpactScanner::new()` skips identifiers < 4 chars; `new_scan_short()` opts in |
| B-04 | `recommendation.rs` | `apply_fixes()` detects `BEGIN/COMMIT` and emits warning comment instead |
| B-05 | `output.rs` | `table_preset()` checks `$TERM`/platform → UTF8 or ASCII table border |
---
## Key type relationships
```
ParsedStatement ──▶ DetectedOperation (N per statement)
│
MigrationReport
┌──────────┴──────────┐
FixSuggestion ImpactReport
(recommendation) (impact)
│
GuardDecision[]
│
GuardAuditLog
```
---
## SARIF rule mapping
| SR001 | DROP TABLE |
| SR002 | ALTER COLUMN TYPE |
| SR003 | DROP COLUMN |
| SR004 | SET NOT NULL |
| SR005 | ADD COLUMN NOT NULL no default |
| SR006 | CREATE INDEX (no CONCURRENTLY) |
| SR007 | ADD FOREIGN KEY |
| SR008 | DROP INDEX |
| SR009 | RENAME COLUMN / TABLE |
| SR010 | TRUNCATE TABLE |
| SR999 | Unmodelled DDL (fallback) |
---
## Remaining work (backlog)
| M-02 | `engine.rs` | Full size multiplier wired to `LiveSchema` row counts | ~2h |
| M-03 | `drift.rs` | Live-vs-disk reconciliation (currently stub) | ~6h |
| M-04 | `graph.rs` | Custom DOT export with table metadata | ~3h |
| I-04 | `db.rs` | Replace `tokio-postgres` with `sqlx` connection pool | ~4h |
| I-05 | `impact.rs` | AST-level query scanning (no regex false positives) | ~8h |
| I-07 | `ci.rs` | Parallel multi-file analysis with rayon | ~2h |
| M-05 | `main.rs` | `--watch` mode (notify crate) | ~4h |
---
## Security notes
- `--db-url` passwords are never logged — use `SCHEMA_RISK_DB_URL` env var in CI
- `ImpactScanner` uses `WalkDir::follow_links(false)` — no symlink traversal
- `LiveSchema` connector is read-only — no DDL/DML allowed
- Guard audit log is written with owner-only permissions on Unix
- CI Markdown output escapes triple-backtick sequences to prevent code block breakage