schema-sync 1.0.0

Production-grade schema synchronization for multi-tenant databases
Documentation
**Developer:** s4gor  
**Github:** https://github.com/s4gor

---

# Schema Sync - Design Document

## Overview

This document explains the architectural decisions and design rationale for the schema-sync crate. It focuses on the **extension layer first** approach, explaining why each abstraction exists and how it enables future features.

## Core Design Principles

### 1. Trait-Based Extensibility

**Principle**: All database operations go through traits, never direct implementations.

**Rationale**: 
- Enables multi-database support without changing core logic
- Allows testing with mock implementations
- Supports different migration strategies per database
- Makes the codebase maintainable as it grows

**Example**: `DatabaseAdapter` trait allows PostgreSQL, MySQL, and SQLite implementations to coexist.

### 2. Separation of Concerns

**Principle**: Clear boundaries between reading, planning, and executing.

**Components**:
- **SchemaInspector**: Read-only schema introspection
- **MigrationRunner**: Write operations for applying migrations
- **Planner**: Determines what changes need to be made
- **Executor**: Orchestrates the execution of planned changes

**Rationale**:
- Audit mode can use inspector without runner (no write permissions needed)
- Dry-run mode can plan without executing
- Different strategies can be plugged in for each concern
- Easier to test each component independently

### 3. Tenant Isolation

**Principle**: Every operation is explicitly scoped to a tenant context.

**Implementation**: `TenantContext` is required for all operations.

**Rationale**:
- Prevents accidental cross-tenant operations
- Makes tenant scoping explicit in the type system
- Enables per-tenant locking strategies
- Supports batch operations across tenants

### 4. Mode-Agnostic Core

**Principle**: The engine doesn't know about CLI modes (dry-run, validation, etc.).

**Rationale**:
- Same engine can be used for sync, dry-run, validation, and audit
- Mode-specific behavior is handled at the CLI layer
- Keeps core logic simple and focused
- Enables programmatic usage without CLI

**Example**: `Engine::sync_tenant()` takes an `execute` parameter. The CLI layer sets this based on the mode.

## Architecture Layers

### Layer 1: CLI Layer

**Responsibility**: User interface, mode handling, output formatting.

**Not in core crate**: The CLI will be implemented separately (either in this crate's `bin/` or a separate crate).

**Modes**:
- `sync`: Apply changes (`execute=true`)
- `diff`: Preview changes (`execute=false`)
- `validate`: Check if schemas match (`execute=false`, exit non-zero if mismatch)
- `audit`: Read-only inspection (uses inspector only)

### Layer 2: Engine Layer

**Responsibility**: Orchestrates all components to provide high-level API.

**Components**:
- `Engine`: Main entry point
- Coordinates inspector, planner, executor, diff calculator
- Handles snapshot management
- Provides unified interface

**Why it exists**: Hides complexity of coordinating multiple components. Provides a simple API for common operations.

### Layer 3: Adapter Layer

**Responsibility**: Database-specific operations through traits.

**Traits**:
- `DatabaseAdapter`: Connection management, factory methods
- `SchemaInspector`: Read schema from database
- `MigrationRunner`: Execute schema changes

**Why separate traits**: 
- Inspector can be used without runner (audit mode)
- Different migration strategies can be implemented
- Testing is easier with mock implementations

### Layer 4: Database Implementations

**Responsibility**: Concrete implementations for each database type.

**Future implementations**:
- `PostgresAdapter`, `PostgresInspector`, `PostgresMigrationRunner`
- `MysqlAdapter`, `MysqlInspector`, `MysqlMigrationRunner`
- `SqliteAdapter`, `SqliteInspector`, `SqliteMigrationRunner`

**Why separate**: Each database has different SQL dialects, connection methods, and capabilities.

## Data Structures

### SchemaSnapshot

**Purpose**: Normalized, database-agnostic representation of a schema.

**Key properties**:
- Deterministic: Same schema always produces same snapshot
- Order-independent where possible (uses HashMaps)
- Database-agnostic: Not tied to PostgreSQL, MySQL, etc.

**Why it exists**: 
- Enables diffing between different database types (theoretical)
- Supports snapshot storage and versioning
- Allows planning without database connection

### SchemaDiff

**Purpose**: Represents differences between two snapshots.

**Structure**: Hierarchical (schema → table → column → constraint).

**Why it exists**:
- Enables human-readable diff output
- Supports machine-readable JSON output
- Allows planning to understand what needs to change

### MigrationPlan

**Purpose**: Executable sequence of operations to transform schema.

**Structure**: Ordered steps with dependencies.

**Why it exists**:
- Separates "what needs to change" (diff) from "how to change it" (plan)
- Enables validation before execution
- Supports dry-run mode
- Allows different planning strategies

### TenantContext

**Purpose**: Explicit tenant scoping for all operations.

**Why it exists**:
- Type safety: Can't accidentally operate on wrong tenant
- Makes tenant isolation explicit
- Supports batch operations
- Enables per-tenant locking

## Extension Points

### 1. Adding a New Database Type

**Steps**:
1. Implement `DatabaseAdapter` trait
2. Implement `SchemaInspector` trait (convert database schema to `SchemaSnapshot`)
3. Implement `MigrationRunner` trait (convert `MigrationPlan` to database SQL)
4. Register in factory or use directly

**No changes needed to**: Engine, planner, executor, diff calculator.

### 2. Adding a New Migration Strategy

**Steps**:
1. Implement `MigrationRunner` trait
2. Convert `MigrationPlan` to your migration format
3. Execute migrations using your tool (diesel, sqlx, etc.)

**Example**: SQL file-based migrations would read SQL files and execute them in order.

### 3. Adding a New Snapshot Storage Backend

**Steps**:
1. Implement `SnapshotStore` trait
2. Store/retrieve `SchemaSnapshot` in your backend (database, S3, etc.)
3. Use with engine

**No changes needed to**: Engine, adapters, planner, executor.

### 4. Adding a New Planning Strategy

**Steps**:
1. Implement `Planner` trait
2. Create `MigrationPlan` from `SchemaDiff` using your strategy
3. Use with engine

**Example**: A "safe" planner might add extra steps to ensure zero-downtime migrations.

### 5. Adding a New Diff Algorithm

**Steps**:
1. Implement `DiffCalculator` trait
2. Calculate `SchemaDiff` from two `SchemaSnapshot`s
3. Use with engine

**Example**: A three-way merge calculator for handling conflicts.

## Future Extensions (Detailed)

### 1. Dry-Run / Diff Mode

**Implementation**: Already supported via `execute=false` parameter.

**Output formats**:
- Text: Format `SchemaDiff` as human-readable text
- JSON: Serialize `SchemaDiff` to JSON

**CLI command**: `schema-sync diff --tenant tenant_123 --format json`

### 2. CI / Validation Mode

**Implementation**: 
- Use `Engine::sync_tenant()` with `execute=false` for all tenants
- Check `already_in_sync` flag
- Exit non-zero if any tenant has `already_in_sync=false`

**CLI command**: `schema-sync validate --all-tenants`

### 3. Multi-Database Support

**Implementation**: Implement adapter traits for each database type.

**Challenges**:
- Different SQL dialects
- Different schema introspection queries
- Different migration execution methods
- Different locking mechanisms

**Solution**: Traits abstract these differences. Each implementation handles database-specific details.

### 4. Read-Only Audit Mode

**Implementation**: Use `SchemaInspector` directly, no `MigrationRunner` needed.

**CLI command**: `schema-sync audit --tenant tenant_123`

**Safety**: No write operations, no locks, safe for production.

### 5. Pluggable Migration Engine

**Implementation**: Different `MigrationRunner` implementations.

**Examples**:
- `SqlFileMigrationRunner`: Executes SQL files in order
- `DieselMigrationRunner`: Uses diesel migration system
- `SqlxMigrationRunner`: Uses sqlx migration system
- `RustCodeMigrationRunner`: Executes Rust code migrations

### 6. Schema Snapshot System

**Implementation**: `SnapshotStore` trait implementations.

**Storage backends**:
- `FileSnapshotStore`: Store in filesystem
- `DatabaseSnapshotStore`: Store in database
- `GitSnapshotStore`: Store in version control
- `S3SnapshotStore`: Store in object storage

**Versioning**: Deterministic hash of snapshot content.

### 7. Tenant Isolation Guarantees

**Implementation**:
- All operations require `TenantContext`
- `MigrationRunner::acquire_lock()` ensures per-tenant locking
- Database adapters enforce tenant scoping in SQL

**Safety**: Type system prevents cross-tenant operations.

## Testing Strategy

### Unit Tests

**Mock implementations**: Create mock implementations of all traits for testing.

**Example**: `MockSchemaInspector` that returns predefined snapshots.

### Integration Tests

**Test database**: Use testcontainers or in-memory databases.

**Test scenarios**:
- Sync empty schema to target
- Sync existing schema to target
- Handle conflicts
- Rollback on failure

### Property Tests

**Deterministic snapshots**: Same schema always produces same snapshot.

**Reversibility**: Plan + execution should result in target schema.

## Performance Considerations

### Snapshot Calculation

**Optimization**: Cache snapshots, only recalculate when schema changes.

**Implementation**: Store snapshot hash, compare before full calculation.

### Batch Operations

**Multiple tenants**: Process tenants in parallel where possible.

**Implementation**: Use `Engine::list_tenants()` and process concurrently.

### Locking

**Per-tenant locks**: Don't block other tenants.

**Implementation**: `MigrationRunner::acquire_lock()` is per-tenant.

## Security Considerations

### Tenant Isolation

**Enforcement**: Database adapters must enforce tenant scoping in SQL.

**Example**: PostgreSQL adapter uses schema-qualified table names.

### Connection Security

**Credentials**: Adapters handle connection security (TLS, authentication).

**Not in core**: Core crate doesn't handle credentials, adapters do.

### Audit Trail

**Future**: Log all schema changes with tenant, timestamp, and user.

**Implementation**: `MigrationRunner` implementations can add logging.

## Error Handling

### Error Types

**Structured errors**: `Error` enum with context.

**Why**: Provides actionable error messages with tenant context.

### Recovery

**Rollback**: `MigrationRunner` implementations handle rollback on failure.

**Partial failures**: `ExecutionResult` indicates how many steps succeeded.

## Conclusion

This architecture provides a solid foundation for schema synchronization that can grow over years. The trait-based design enables extensibility without breaking changes, and the separation of concerns makes the codebase maintainable.

The key insight is designing for extension first: every abstraction exists to enable a future feature, not just to solve the current problem.