# Merge Feature Design
**Status**: ✅ Implemented (v1.4.0)
**Author**: Amp
**Date**: 2025-12-20
## Overview
The `merge` command is the inverse of `split` — it combines multiple split SQL table files back into a single SQL dump file. This is useful for:
- Reconstructing a full database dump after editing individual tables
- Creating partial dumps by merging a subset of tables
- Reordering tables for dependency-aware imports
## Command Interface
```bash
# Basic usage - merge all .sql files in directory
sql-splitter merge tables/ -o restored.sql
# Merge specific tables only
sql-splitter merge tables/ -o partial.sql --tables users,posts,comments
# Exclude certain tables
sql-splitter merge tables/ -o restored.sql --exclude cache,sessions
# Specify table order (for foreign key dependencies)
sql-splitter merge tables/ -o restored.sql --order users,posts,comments
# Order by dependency analysis (auto-detect foreign keys)
sql-splitter merge tables/ -o restored.sql --auto-order
# Add transaction wrapper
sql-splitter merge tables/ -o restored.sql --transaction
# Dialect-specific output
sql-splitter merge tables/ -o restored.sql --dialect postgres
```
## CLI Options
| `-o, --output` | Output SQL file path | `stdout` |
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-t, --tables` | Only merge these tables (comma-separated) | all |
| `--exclude` | Exclude these tables (comma-separated) | none |
| `--order` | Explicit table order (comma-separated) | alphabetical |
| `--auto-order` | Analyze foreign keys and order dependencies | false |
| `--transaction` | Wrap output in BEGIN/COMMIT transaction | false |
| `--no-header` | Skip generating header comments | false |
| `--progress` | Show progress bar | false |
| `--dry-run` | List tables that would be merged | false |
## Output Format
### Header (Optional)
```sql
-- SQL Merge Output
-- Generated by sql-splitter v1.4.0
-- Date: 2025-12-20 12:00:00
-- Tables: 10
-- Dialect: mysql
```
### Transaction Wrapper (Optional)
```sql
SET FOREIGN_KEY_CHECKS = 0; -- MySQL
BEGIN;
-- ... table contents ...
COMMIT;
SET FOREIGN_KEY_CHECKS = 1; -- MySQL
```
### Table Separator
```sql
-- ============================================================
-- Table: users
-- ============================================================
CREATE TABLE `users` ...
INSERT INTO `users` ...
```
## Implementation Plan
### Phase 1: Basic Merge (MVP)
1. **File Discovery**
- Scan input directory for `.sql` files
- Extract table name from filename (e.g., `users.sql` → `users`)
- Support glob patterns for flexibility
2. **Simple Concatenation**
- Read files in alphabetical order (default)
- Write to output with table separators
- Stream large files (don't load all into memory)
3. **Basic Filtering**
- `--tables` whitelist
- `--exclude` blacklist
### Phase 2: Ordering & Dependencies
1. **Explicit Ordering**
- `--order users,posts,comments` for manual control
- Validate all specified tables exist
2. **Auto-Ordering (Optional)**
- Parse CREATE TABLE statements for FOREIGN KEY references
- Build dependency graph
- Topological sort for correct order
- Detect circular dependencies and warn
### Phase 3: Dialect-Aware Output
1. **Header Generation**
- MySQL: `SET NAMES utf8mb4; SET FOREIGN_KEY_CHECKS=0;`
- PostgreSQL: `SET client_encoding = 'UTF8';`
- SQLite: `PRAGMA foreign_keys = OFF;`
2. **Footer Generation**
- MySQL: `SET FOREIGN_KEY_CHECKS=1;`
- PostgreSQL: (none needed)
- SQLite: `PRAGMA foreign_keys = ON;`
3. **Transaction Wrapping**
- Wrap entire output in `BEGIN;` / `COMMIT;`
- Optional per-table transactions
### Phase 4: Advanced Features
1. **Compression Output**
- Support `.gz`, `.zst` output based on extension
- Mirror input compression support
2. **Schema/Data Separation**
- `--schema-first` - output all CREATE TABLEs, then all INSERTs
- Useful for some import scenarios
3. **Validation**
- Verify SQL syntax before writing
- Check for duplicate table definitions
## Architecture
```
src/
├── cmd/
│ └── merge.rs # CLI handler
├── merger/
│ ├── mod.rs # Public API
│ ├── discovery.rs # File scanning
│ ├── ordering.rs # Dependency analysis
│ └── writer.rs # Output generation
```
### Key Types
```rust
pub struct MergeConfig {
pub input_dir: PathBuf,
pub output: Option<PathBuf>, // None = stdout
pub dialect: SqlDialect,
pub tables: Option<Vec<String>>,
pub exclude: Vec<String>,
pub order: TableOrder,
pub transaction: bool,
pub header: bool,
pub progress: bool,
}
pub enum TableOrder {
Alphabetical,
Explicit(Vec<String>),
AutoDependency,
}
pub struct MergeStats {
pub tables_merged: usize,
pub bytes_written: u64,
pub table_names: Vec<String>,
}
```
### Streaming Architecture
```
Directory Scan → File Iterator → BufReader → Output BufWriter
↓
Progress Callback
```
- Never load entire files into memory
- Stream from input files directly to output
- Use 256KB buffers (matching split command)
## Edge Cases
1. **Empty directory** → Error with helpful message
2. **No matching tables** → Warning, empty output
3. **Circular foreign keys** → Warning, use alphabetical fallback
4. **Duplicate table files** → Error (e.g., `users.sql` and `Users.sql` on case-insensitive FS)
5. **Non-SQL files in directory** → Skip with warning if `--verbose`
6. **Mixed dialects** → Warn if detected, proceed with specified dialect
## Testing Strategy
1. **Unit Tests**
- File discovery
- Table name extraction
- Dependency graph building
- Order validation
2. **Integration Tests**
- Split → Merge roundtrip
- Verify output matches original (modulo formatting)
- Test with all three dialects
3. **Edge Case Tests**
- Empty tables
- Large files (>1GB)
- Unicode table names
- Special characters in data
## Performance Targets
- **Throughput**: Match split command (~400+ MB/s)
- **Memory**: Constant regardless of file size
- **Startup**: <10ms to begin writing
## Open Questions
1. Should `--auto-order` be the default behavior?
2. Should we support merging from multiple directories?
3. Should we add a `--verify` flag to validate SQL syntax?
4. Should the output include `DROP TABLE IF EXISTS` statements?
5. Should we support reading from stdin (pipe multiple files)?
## Timeline Estimate
| Phase 1: Basic Merge | 2-3 hours | High |
| Phase 2: Ordering | 2 hours | Medium |
| Phase 3: Dialect-Aware | 1-2 hours | Medium |
| Phase 4: Advanced | 3-4 hours | Low |
**MVP (Phase 1)**: Ready in ~3 hours
## Related
- [Split Command](../../src/cmd/split.rs)
- [Splitter Module](../../src/splitter/mod.rs)