# pgmold Architecture
## Overview
pgmold is a PostgreSQL schema-as-code tool built in Rust. It follows a pipeline architecture where schemas flow through parsing, normalization, diffing, planning, and execution stages.
## Core Principles
1. **Canonical Model is Truth**: All operations use the normalized `model::Schema` IR. No module compares SQL to DB directly.
2. **Deterministic Output**: BTreeMap everywhere. Sorted collections. Predictable diffs.
3. **Strict Module Boundaries**: No SQL outside `pg/sqlgen.rs`. No DB access outside `pg/`.
4. **Fail Fast**: No panics. Clear errors via `anyhow::Result`.
## Module Structure
```
pgmold/
├── src/
│ ├── cli/ # CLI argument parsing, command routing
│ ├── parser/ # PostgreSQL DDL parser → canonical model
│ │ ├── mod.rs # SQL parsing with sqlparser
│ │ └── loader.rs # Multi-file schema loading
│ ├── model/ # Canonical schema IR (the core)
│ ├── pg/
│ │ ├── connection.rs # Database connection pool
│ │ ├── introspect.rs # DB → canonical model
│ │ └── sqlgen.rs # Migration ops → SQL
│ ├── diff/
│ │ ├── mod.rs # Schema comparison
│ │ └── planner.rs # Operation ordering
│ ├── filter/ # Object filtering by name patterns and types
│ ├── lint/ # Safety rules
│ │ ├── mod.rs # Lint rules and severity
│ │ └── locks.rs # Lock hazard detection
│ ├── drift/ # Drift detection via fingerprinting
│ ├── baseline/ # Schema export with round-trip validation
│ ├── dump.rs # Schema → SQL DDL generation
│ ├── migrate.rs # Migration file numbering utilities
│ ├── apply/ # Transactional execution
│ ├── util/ # Shared types, errors
│ └── main.rs
└── tests/
├── integration.rs # testcontainers tests
├── baseline.rs # Baseline command tests
└── semantic_equivalence.rs # Normalization tests
```
## Data Flow
```
┌─────────────┐ ┌─────────────┐
│ SQL File │ │ PostgreSQL │
└──────┬──────┘ └──────┬──────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│parser::parse│ │pg::introspect│
└──────┬──────┘ └──────┬──────┘
│ │
└────────┬──────────┘
│
▼
┌───────────────┐
│ model::Schema │ ← Canonical IR
└───────┬───────┘
│
▼
┌───────────────┐
│filter::filter │ ← Apply include/exclude patterns
└───────┬───────┘
│
▼
┌───────────────┐
│ diff::compute │
└───────┬───────┘
│
▼
┌───────────────┐
│ MigrationOp │ ← Operations list
└───────┬───────┘
│
▼
┌───────────────┐
│ diff::planner │ ← Order operations
└───────┬───────┘
│
▼
┌───────────────┐
│ lint::check │ ← Safety validation
└───────┬───────┘
│
▼
┌───────────────┐
│ pg::sqlgen │ ← Generate SQL
└───────┬───────┘
│
▼
┌───────────────┐
│ apply::exec │ ← Execute in transaction
└───────────────┘
```
## Canonical Model (`model/`)
The canonical IR represents all schema objects in a normalized form:
```rust
pub struct Schema {
pub tables: BTreeMap<String, Table>,
pub enums: BTreeMap<String, EnumType>,
pub domains: BTreeMap<String, Domain>,
pub extensions: BTreeMap<String, Extension>,
pub functions: BTreeMap<String, Function>,
pub views: BTreeMap<String, View>,
pub triggers: BTreeMap<String, Trigger>,
pub sequences: BTreeMap<String, Sequence>,
pub partitions: BTreeMap<String, Partition>,
}
pub struct Table {
pub name: String,
pub schema: String,
pub columns: BTreeMap<String, Column>,
pub indexes: BTreeMap<String, Index>,
pub primary_key: Option<PrimaryKey>,
pub foreign_keys: BTreeMap<String, ForeignKey>,
pub check_constraints: BTreeMap<String, CheckConstraint>,
pub policies: BTreeMap<String, Policy>,
pub rls_enabled: bool,
pub rls_force: bool,
pub partition_key: Option<PartitionKey>,
}
pub struct Column {
pub name: String,
pub data_type: PgType,
pub nullable: bool,
pub default: Option<String>,
pub identity: Option<String>,
}
```
**Key Design Decisions:**
- `BTreeMap` for deterministic iteration order
- Map keys use qualified names: `schema.name`
- All objects have a `schema` field (default: "public")
- Fingerprinting via SHA256 of JSON serialization
## Migration Operations
Operations represent atomic schema changes:
```rust
pub enum MigrationOp {
CreateExtension(Extension),
DropExtension(String),
CreateEnum(EnumType),
DropEnum(String, String),
AddEnumValue { ... },
CreateDomain(Domain),
DropDomain(String, String),
AlterDomain { ... },
CreateTable(Table),
DropTable(String, String),
CreatePartition(Partition),
DropPartition(String, String),
AddColumn { ... },
DropColumn { ... },
AlterColumn { ... },
AddPrimaryKey { ... },
DropPrimaryKey { ... },
AddIndex { ... },
DropIndex { ... },
AddForeignKey { ... },
DropForeignKey { ... },
AddCheckConstraint { ... },
DropCheckConstraint { ... },
EnableRls { ... },
DisableRls { ... },
ForceRls { ... },
NoForceRls { ... },
CreatePolicy(Policy),
AlterPolicy { ... },
DropPolicy { ... },
CreateFunction(Function),
DropFunction { ... },
ReplaceFunction(Function),
CreateView(View),
DropView { ... },
ReplaceView(View),
CreateTrigger(Trigger),
DropTrigger { ... },
AlterTriggerEnabled { ... },
CreateSequence(Sequence),
DropSequence { ... },
AlterSequence { ... },
}
```
## Operation Ordering
The planner orders operations to satisfy dependencies:
1. **Create phase** (safe to add):
- CreateExtension
- CreateEnum, AddEnumValue
- CreateDomain
- CreateSequence
- CreateTable (topologically sorted by FK dependencies)
- CreatePartition
- AddColumn, AlterColumn
- AddPrimaryKey
- AddIndex
- AddForeignKey
- AddCheckConstraint
- EnableRls, ForceRls
- CreatePolicy, AlterPolicy
- CreateFunction, ReplaceFunction
- CreateView, ReplaceView
- CreateTrigger
2. **Drop phase** (reverse order):
- DropTrigger
- DropView
- DropFunction
- DropPolicy
- DisableRls, NoForceRls
- DropCheckConstraint
- DropForeignKey
- DropIndex
- DropPrimaryKey
- DropColumn
- DropPartition
- DropTable
- DropSequence
- DropDomain
- DropEnum
- DropExtension
## Object Filtering
The `filter` module supports filtering by:
- Name patterns (glob syntax: `*`, `?`)
- Object types (tables, indexes, policies, etc.)
Filters apply to both source and target schemas before diffing.
## Lint Rules
| `deny_drop_column` | Error | Without `--allow-destructive` |
| `deny_drop_table` | Error | Without `--allow-destructive` |
| `deny_drop_enum` | Error | Without `--allow-destructive` |
| `deny_drop_table_in_prod` | Error | When `PGMOLD_PROD=1` |
| `warn_type_narrowing` | Warning | Type change may lose data |
| `warn_set_not_null` | Warning | May fail on existing NULLs |
Lock hazard detection warns about operations that acquire exclusive locks.
## Module Dependencies
```
cli → parser, pg, diff, filter, lint, drift, baseline, dump, migrate, apply
parser → model
pg/introspect → model
pg/sqlgen → model, diff
diff → model
filter → model
lint → diff
drift → model
baseline → parser, pg, diff, dump
dump → model, pg/sqlgen
apply → pg
```
No circular dependencies. `model` is the leaf dependency.
## Testing Strategy
- **Unit tests**: Each module has inline `#[cfg(test)]` modules
- **Integration tests**: Full pipeline with testcontainers PostgreSQL
- **Semantic equivalence tests**: Verify normalization produces identical results
## Supported PostgreSQL Features
- Tables, columns, partitioned tables
- Primary keys, foreign keys, check constraints
- Indexes (btree, hash, gin, gist, brin)
- Enums, domains
- Functions (with volatility, security, SET parameters)
- Views
- Triggers (with WHEN clauses, transition tables)
- Sequences (with SERIAL/BIGSERIAL support)
- Row-Level Security (RLS) policies
- Extensions
- Multi-schema support