sql-splitter 1.8.0

# Merge Feature Design

**Status**: ✅ Implemented (v1.4.0)  
**Author**: Amp  
**Date**: 2025-12-20

## Overview

The `merge` command is the inverse of `split` — it combines multiple split SQL table files back into a single SQL dump file. This is useful for:

- Reconstructing a full database dump after editing individual tables
- Creating partial dumps by merging a subset of tables
- Reordering tables for dependency-aware imports

## Command Interface

```bash
# Basic usage - merge all .sql files in directory
sql-splitter merge tables/ -o restored.sql

# Merge specific tables only
sql-splitter merge tables/ -o partial.sql --tables users,posts,comments

# Exclude certain tables
sql-splitter merge tables/ -o restored.sql --exclude cache,sessions

# Specify table order (for foreign key dependencies)
sql-splitter merge tables/ -o restored.sql --order users,posts,comments

# Order by dependency analysis (auto-detect foreign keys)
sql-splitter merge tables/ -o restored.sql --auto-order

# Add transaction wrapper
sql-splitter merge tables/ -o restored.sql --transaction

# Dialect-specific output
sql-splitter merge tables/ -o restored.sql --dialect postgres
```

## CLI Options

| Flag | Description | Default |
|------|-------------|---------|
| `-o, --output` | Output SQL file path | `stdout` |
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-t, --tables` | Only merge these tables (comma-separated) | all |
| `--exclude` | Exclude these tables (comma-separated) | none |
| `--order` | Explicit table order (comma-separated) | alphabetical |
| `--auto-order` | Analyze foreign keys and order dependencies | false |
| `--transaction` | Wrap output in BEGIN/COMMIT transaction | false |
| `--no-header` | Skip generating header comments | false |
| `--progress` | Show progress bar | false |
| `--dry-run` | List tables that would be merged | false |

## Output Format

### Header (Optional)

```sql
-- SQL Merge Output
-- Generated by sql-splitter v1.4.0
-- Date: 2025-12-20 12:00:00
-- Tables: 10
-- Dialect: mysql

```

### Transaction Wrapper (Optional)

```sql
SET FOREIGN_KEY_CHECKS = 0;  -- MySQL
BEGIN;

-- ... table contents ...

COMMIT;
SET FOREIGN_KEY_CHECKS = 1;  -- MySQL
```

### Table Separator

```sql
-- ============================================================
-- Table: users
-- ============================================================

CREATE TABLE `users` ...
INSERT INTO `users` ...
```

## Implementation Plan

### Phase 1: Basic Merge (MVP)

1. **File Discovery**
   - Scan input directory for `.sql` files
   - Extract table name from filename (e.g., `users.sql` → `users`)
   - Support glob patterns for flexibility

2. **Simple Concatenation**
   - Read files in alphabetical order (default)
   - Write to output with table separators
   - Stream large files (don't load all into memory)

3. **Basic Filtering**
   - `--tables` whitelist
   - `--exclude` blacklist

### Phase 2: Ordering & Dependencies

1. **Explicit Ordering**
   - `--order users,posts,comments` for manual control
   - Validate all specified tables exist

2. **Auto-Ordering (Optional)**
   - Parse CREATE TABLE statements for FOREIGN KEY references
   - Build dependency graph
   - Topological sort for correct order
   - Detect circular dependencies and warn

### Phase 3: Dialect-Aware Output

1. **Header Generation**
   - MySQL: `SET NAMES utf8mb4; SET FOREIGN_KEY_CHECKS=0;`
   - PostgreSQL: `SET client_encoding = 'UTF8';`
   - SQLite: `PRAGMA foreign_keys = OFF;`

2. **Footer Generation**
   - MySQL: `SET FOREIGN_KEY_CHECKS=1;`
   - PostgreSQL: (none needed)
   - SQLite: `PRAGMA foreign_keys = ON;`

3. **Transaction Wrapping**
   - Wrap entire output in `BEGIN;` / `COMMIT;`
   - Optional per-table transactions

### Phase 4: Advanced Features

1. **Compression Output**
   - Support `.gz`, `.zst` output based on extension
   - Mirror input compression support

2. **Schema/Data Separation**
   - `--schema-first` - output all CREATE TABLEs, then all INSERTs
   - Useful for some import scenarios

3. **Validation**
   - Verify SQL syntax before writing
   - Check for duplicate table definitions

## Architecture

```
src/
├── cmd/
│   └── merge.rs          # CLI handler
├── merger/
│   ├── mod.rs            # Public API
│   ├── discovery.rs      # File scanning
│   ├── ordering.rs       # Dependency analysis
│   └── writer.rs         # Output generation
```

### Key Types

```rust
pub struct MergeConfig {
    pub input_dir: PathBuf,
    pub output: Option<PathBuf>,  // None = stdout
    pub dialect: SqlDialect,
    pub tables: Option<Vec<String>>,
    pub exclude: Vec<String>,
    pub order: TableOrder,
    pub transaction: bool,
    pub header: bool,
    pub progress: bool,
}

pub enum TableOrder {
    Alphabetical,
    Explicit(Vec<String>),
    AutoDependency,
}

pub struct MergeStats {
    pub tables_merged: usize,
    pub bytes_written: u64,
    pub table_names: Vec<String>,
}
```

### Streaming Architecture

```
Directory Scan → File Iterator → BufReader → Output BufWriter
                      ↓
              Progress Callback
```

- Never load entire files into memory
- Stream from input files directly to output
- Use 256KB buffers (matching split command)

## Edge Cases

1. **Empty directory** → Error with helpful message
2. **No matching tables** → Warning, empty output
3. **Circular foreign keys** → Warning, use alphabetical fallback
4. **Duplicate table files** → Error (e.g., `users.sql` and `Users.sql` on case-insensitive FS)
5. **Non-SQL files in directory** → Skip with warning if `--verbose`
6. **Mixed dialects** → Warn if detected, proceed with specified dialect

## Testing Strategy

1. **Unit Tests**
   - File discovery
   - Table name extraction
   - Dependency graph building
   - Order validation

2. **Integration Tests**
   - Split → Merge roundtrip
   - Verify output matches original (modulo formatting)
   - Test with all three dialects

3. **Edge Case Tests**
   - Empty tables
   - Large files (>1GB)
   - Unicode table names
   - Special characters in data

## Performance Targets

- **Throughput**: Match split command (~400+ MB/s)
- **Memory**: Constant regardless of file size
- **Startup**: <10ms to begin writing

## Open Questions

1. Should `--auto-order` be the default behavior?
2. Should we support merging from multiple directories?
3. Should we add a `--verify` flag to validate SQL syntax?
4. Should the output include `DROP TABLE IF EXISTS` statements?
5. Should we support reading from stdin (pipe multiple files)?

## Timeline Estimate

| Phase | Effort | Priority |
|-------|--------|----------|
| Phase 1: Basic Merge | 2-3 hours | High |
| Phase 2: Ordering | 2 hours | Medium |
| Phase 3: Dialect-Aware | 1-2 hours | Medium |
| Phase 4: Advanced | 3-4 hours | Low |

**MVP (Phase 1)**: Ready in ~3 hours

## Related

- [Split Command](../../src/cmd/split.rs)
- [Splitter Module](../../src/splitter/mod.rs)