schema-sync 1.0.0

Production-grade schema synchronization for multi-tenant databases
Documentation
**Developer:** s4gor  
**Github:** https://github.com/s4gor

---

# Schema Sync - Architecture Summary

## Project Structure

```
schema-sync/
├── Cargo.toml              # Project configuration
├── README.md               # User-facing documentation
├── DESIGN.md               # Detailed design rationale
├── ARCHITECTURE.md         # This file - architecture overview
├── .gitignore              # Git ignore rules
├── src/
│   ├── lib.rs              # Main library entry point with architecture diagram
│   ├── adapters.rs         # Database adapter traits (DatabaseAdapter, SchemaInspector, MigrationRunner)
│   ├── diff.rs             # Schema diff calculation and representation
│   ├── engine.rs           # Main engine orchestration
│   ├── errors.rs           # Error types
│   ├── planner.rs          # Migration planning
│   ├── executor.rs         # Migration execution
│   ├── snapshot.rs         # Schema snapshot system
│   └── cli.rs              # CLI types and context
└── examples/
    ├── basic_usage.rs       # Basic sync example
    ├── dry_run_mode.rs      # Dry-run mode example
    └── ci_validation.rs     # CI validation example
```

## Core Abstractions

### 1. DatabaseAdapter Trait

**Purpose**: Main entry point for database operations.

**Key Methods**:
- `inspector()``Box<dyn SchemaInspector>`
- `migration_runner()``Box<dyn MigrationRunner>`
- `database_type()``&str`
- `test_connection()``Result<()>`

**Why it exists**: Factory pattern for creating inspectors and runners. Enables multi-database support.

### 2. SchemaInspector Trait

**Purpose**: Read-only schema introspection.

**Key Methods**:
- `inspect_schema(tenant)``Result<SchemaSnapshot>`
- `schema_exists(tenant)``Result<bool>`
- `list_tenants()``Result<Vec<TenantContext>>`

**Why it exists**: 
- Enables audit mode without write permissions
- Allows dry-run mode to calculate diffs without locks
- Supports testing with mock inspectors

### 3. MigrationRunner Trait

**Purpose**: Execute schema changes.

**Key Methods**:
- `execute_migration(tenant, plan)``Result<MigrationResult>`
- `acquire_lock(tenant, timeout)``Result<Box<dyn LockGuard>>`
- `validate_migration(tenant, plan)``Result<()>`

**Why it exists**:
- Pluggable migration engines (SQL files, Rust code, external tools)
- Different strategies per database type
- Testing with mock runners

### 4. Planner Trait

**Purpose**: Create executable migration plans from schema diffs.

**Key Methods**:
- `create_plan(current, target, diff)``Result<MigrationPlan>`
- `validate_plan(plan)``Result<()>`

**Why it exists**:
- Dry-run mode can show what would happen
- Validation of plans before execution
- Different planning strategies (safe ordering, dependency resolution)

### 5. Executor Trait

**Purpose**: Orchestrate the execution of migration plans.

**Key Methods**:
- `execute(tenant, plan, runner)``Result<ExecutionResult>`
- `dry_run(tenant, plan, runner)``Result<ExecutionResult>`

**Why it exists**:
- Different execution strategies (transactional, non-transactional)
- Progress reporting
- Retry logic

### 6. DiffCalculator Trait

**Purpose**: Calculate differences between schema snapshots.

**Key Methods**:
- `calculate_diff(from, to)``SchemaDiff`

**Why it exists**:
- Different diff algorithms
- Three-way merge support
- Conflict detection

### 7. SnapshotStore Trait

**Purpose**: Store and retrieve schema snapshots.

**Key Methods**:
- `store(tenant, snapshot)``Result<()>`
- `get_latest(tenant)``Result<Option<SchemaSnapshot>>`
- `get_by_hash(tenant, hash)``Result<Option<SchemaSnapshot>>`
- `list(tenant)``Result<Vec<SchemaSnapshot>>`
- `compare(tenant, hash_a, hash_b)``Result<SchemaDiff>`

**Why it exists**:
- Multiple storage backends (filesystem, database, version control)
- Version history
- Deterministic versioning

## Data Structures

### SchemaSnapshot

Normalized, database-agnostic representation of a schema.

**Properties**:
- Deterministic: Same schema always produces same snapshot
- Order-independent (uses HashMaps)
- Database-agnostic

**Contains**:
- Tables (with columns, constraints, indexes)
- Views
- Functions
- Types

### SchemaDiff

Represents differences between two snapshots.

**Structure**: Hierarchical (schema → table → column → constraint)

**Contains**:
- Tables: added, removed, modified
- Views: added, removed, modified
- Functions: added, removed, modified
- Types: added, removed, modified

### MigrationPlan

Executable sequence of operations to transform schema.

**Structure**: Ordered steps with dependencies

**Contains**:
- Steps (ordered operations)
- Estimated duration
- Downtime requirements
- Warnings

### TenantContext

Explicit tenant scoping for all operations.

**Properties**:
- Single field: `tenant_id: String`
- Required for all operations
- Prevents cross-tenant leakage

## Extension Points

### Adding a New Database Type

1. Implement `DatabaseAdapter`
2. Implement `SchemaInspector` (convert DB schema → `SchemaSnapshot`)
3. Implement `MigrationRunner` (convert `MigrationPlan` → DB SQL)
4. Use with engine

**No changes needed to**: Engine, planner, executor, diff calculator.

### Adding a New Migration Strategy

1. Implement `MigrationRunner`
2. Convert `MigrationPlan` to your format
3. Execute using your tool

**Example**: SQL file migrations, diesel migrations, sqlx migrations.

### Adding a New Snapshot Storage Backend

1. Implement `SnapshotStore`
2. Store/retrieve `SchemaSnapshot` in your backend
3. Use with engine

**Example**: Filesystem, database, S3, version control.

### Adding a New Planning Strategy

1. Implement `Planner`
2. Create `MigrationPlan` from `SchemaDiff`
3. Use with engine

**Example**: Safe planner for zero-downtime migrations.

### Adding a New Diff Algorithm

1. Implement `DiffCalculator`
2. Calculate `SchemaDiff` from two `SchemaSnapshot`s
3. Use with engine

**Example**: Three-way merge calculator.

## Operation Modes

### Sync Mode

**Implementation**: `Engine::sync_tenant(tenant, target, execute=true)`

**Behavior**: Calculate diff, create plan, execute plan.

### Dry-Run Mode

**Implementation**: `Engine::sync_tenant(tenant, target, execute=false)`

**Behavior**: Calculate diff, create plan, validate plan, return diff without executing.

### Validation Mode (CI)

**Implementation**: 
- `Engine::sync_tenant(tenant, target, execute=false)` for all tenants
- Check `already_in_sync` flag
- Exit non-zero if any tenant has `already_in_sync=false`

**Behavior**: Verify all tenants match expected schema.

### Audit Mode

**Implementation**: Use `SchemaInspector` directly, no `MigrationRunner`.

**Behavior**: Read-only inspection, no changes allowed.

## Design Decisions

### Why Traits Over Enums?

Traits allow:
- Multiple implementations to coexist
- Pluggable components
- Testing with mocks
- Extension without modification

### Why Separate Inspector and Runner?

- Inspector can be used without runner (audit mode)
- Different migration strategies can be implemented
- Testing is easier with mock implementations

### Why Separate Planner and Executor?

- Dry-run mode can plan without executing
- Validation of plans before execution
- Different execution strategies

### Why Snapshot System?

- Enables diffing schema version A vs B
- Supports version control integration
- Allows deterministic versioning

### Why TenantContext Everywhere?

- Type safety: Can't accidentally operate on wrong tenant
- Makes tenant isolation explicit
- Supports batch operations
- Enables per-tenant locking

## Future Implementation Tasks

### Phase 1: Core Implementations

- [ ] Default `Planner` implementation
- [ ] Default `Executor` implementation
- [ ] Default `DiffCalculator` implementation
- [ ] File-based `SnapshotStore` implementation

### Phase 2: PostgreSQL Support

- [ ] `PostgresAdapter` implementation
- [ ] `PostgresInspector` implementation
- [ ] `PostgresMigrationRunner` implementation

### Phase 3: CLI

- [ ] CLI argument parsing
- [ ] Mode handling (sync, diff, validate, audit)
- [ ] Output formatting (text, JSON)
- [ ] Exit codes for CI

### Phase 4: Additional Databases

- [ ] MySQL support
- [ ] SQLite support

### Phase 5: Advanced Features

- [ ] Three-way merge support
- [ ] Conflict detection
- [ ] Zero-downtime migration strategies
- [ ] Progress reporting
- [ ] Audit trail

## Testing Strategy

### Unit Tests

- Mock implementations of all traits
- Test each component in isolation

### Integration Tests

- Test database (testcontainers or in-memory)
- Test full sync flow
- Test error handling and rollback

### Property Tests

- Deterministic snapshots
- Reversibility (plan + execution = target)

## Conclusion

This architecture provides a solid, extensible foundation for schema synchronization. The trait-based design enables growth without breaking changes, and the separation of concerns makes the codebase maintainable.

The key insight: **design for extension first**. Every abstraction exists to enable a future feature, not just to solve the current problem.