# sql-splitter
[](https://crates.io/crates/sql-splitter)
[](https://crates.io/crates/sql-splitter)
[](https://github.com/helgesverre/sql-splitter/actions)
[](LICENSE.md)
[](https://ampcode.com/@helgesverre)
Split large SQL dump files into individual table files. Fast, memory-efficient, multi-dialect.
- **600+ MB/s** throughput on modern hardware
- **MySQL, PostgreSQL, SQLite** support (including `COPY FROM stdin`)
- **Compressed files** — gzip, bzip2, xz, zstd auto-detected
- **Streaming architecture** — handles files larger than RAM
- **5x faster** than shell-based alternatives
## Installation
### From crates.io
```bash
cargo install sql-splitter
```
### From source
```bash
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install # Installs binary + shell completions + man pages
```
Or download pre-built binaries from [GitHub Releases](https://github.com/helgesverre/sql-splitter/releases).
### Man pages (optional)
After installation, view documentation with `man sql-splitter` or `man sql-splitter-diff`.
For `cargo install` users, install man pages manually:
```bash
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install-man
```
## Usage
```bash
# MySQL/MariaDB dump (default)
sql-splitter split dump.sql -o tables/
# PostgreSQL pg_dump
sql-splitter split pg_dump.sql -o tables/ --dialect=postgres
# SQLite dump
sql-splitter split sqlite.sql -o tables/ --dialect=sqlite
# Compressed files (auto-detected)
sql-splitter split backup.sql.gz -o tables/
sql-splitter split backup.sql.zst -o tables/
# Split specific tables only
sql-splitter split dump.sql --tables users,posts,orders
# Schema only (CREATE TABLE, indexes, etc.)
sql-splitter split dump.sql -o schema/ --schema-only
# Data only (INSERT/COPY statements)
sql-splitter split dump.sql -o data/ --data-only
# Merge split files back into single dump
sql-splitter merge tables/ -o restored.sql
# Merge specific tables only
sql-splitter merge tables/ -o partial.sql --tables users,orders
# Merge with transaction wrapper
sql-splitter merge tables/ -o restored.sql --transaction
# Analyze without splitting
sql-splitter analyze dump.sql
# Convert between SQL dialects
sql-splitter convert mysql_dump.sql -o postgres_dump.sql --to postgres
sql-splitter convert pg_dump.sql -o mysql_dump.sql --to mysql
sql-splitter convert dump.sql -o sqlite_dump.sql --to sqlite
# Convert with explicit source dialect
sql-splitter convert dump.sql --from postgres --to mysql -o output.sql
# Validate SQL dump integrity
sql-splitter validate dump.sql
# Validate with strict mode (warnings = errors)
sql-splitter validate dump.sql --strict
# Validate with JSON output for CI
sql-splitter validate dump.sql --json
# Batch operations with glob patterns
sql-splitter validate "dumps/*.sql" --fail-fast
sql-splitter analyze "**/*.sql"
sql-splitter split "*.sql" -o output/
sql-splitter convert "*.sql" --to postgres -o converted/
# Compare two SQL dumps for changes
sql-splitter diff old.sql new.sql
# Diff with schema-only or data-only
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only
# Diff with JSON or SQL migration output
sql-splitter diff old.sql new.sql --format json -o diff.json
sql-splitter diff old.sql new.sql --format sql -o migration.sql
# Diff with verbose PK samples and ignore timestamp columns
sql-splitter diff old.sql new.sql --verbose --ignore-columns "*.updated_at,*.created_at"
# Override primary key for tables without PK
sql-splitter diff old.sql new.sql --primary-key logs:timestamp+message
# Generate shell completions (auto-installed with make install)
sql-splitter completions bash >> ~/.bashrc
sql-splitter completions zsh >> ~/.zshrc
sql-splitter completions fish >> ~/.config/fish/completions/sql-splitter.fish
```
### Shell Completions
Shell completions are automatically installed when using `make install`. For manual installation:
```bash
# Install for current shell only
make install-completions
# Install for all shells (bash, zsh, fish)
make install-completions-all
```
## Why sql-splitter?
sql-splitter is a **dump-first, CLI-first** tool designed for automation and CI/CD pipelines.
**What it's optimized for**
| Strength | Description |
|--------------------------------|------------------------------------------------------------------------------------|
| **One tool for the workflow** | Split → sample → shard → convert → merge in a single binary |
| **Works on dump files** | No running database or JDBC connection needed (unlike mydumper, Jailer, Condenser) |
| **Streaming architecture** | 10GB+ dumps with constant memory, 600+ MB/s throughput |
| **Multi-dialect + conversion** | MySQL, PostgreSQL, SQLite including `COPY FROM stdin` → INSERT |
| **FK-aware operations** | Sampling and tenant sharding preserve referential integrity |
**When another tool might be better**
- **[mydumper](https://github.com/mydumper/mydumper)** — Parallel snapshots from live MySQL/MariaDB databases
- **[Jailer](https://github.com/Wisser/Jailer)** — Rich GUI-based FK subsetting with JDBC across 12+ databases
- **[sqlglot](https://github.com/tobymao/sqlglot)** — Query-level transpilation and AST manipulation (31 dialects)
- **[DuckDB](https://github.com/duckdb/duckdb)** — Complex analytical queries over SQL/CSV/JSON/Parquet
See [docs/COMPETITIVE_ANALYSIS.md](docs/COMPETITIVE_ANALYSIS.md) for detailed comparisons.
## Options
### Split Options
| Flag | Description | Default |
|------------------|--------------------------------------------|-------------|
| `-o, --output` | Output directory | `output` |
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-t, --tables` | Only split these tables (comma-separated) | — |
| `-p, --progress` | Show progress bar | — |
| `--dry-run` | Preview without writing files | — |
| `--schema-only` | Only DDL statements (CREATE, ALTER, DROP) | — |
| `--data-only` | Only DML statements (INSERT, COPY) | — |
| `--fail-fast` | Stop on first error (for glob patterns) | — |
| `--json` | Output results as JSON | — |
Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).
### Merge Options
| Flag | Description | Default |
|------------------|-------------------------------------------|---------|
| `-o, --output` | Output SQL file | stdout |
| `-d, --dialect` | SQL dialect for headers/footers | `mysql` |
| `-t, --tables` | Only merge these tables (comma-separated) | all |
| `-e, --exclude` | Exclude these tables (comma-separated) | — |
| `--transaction` | Wrap in BEGIN/COMMIT transaction | — |
| `--no-header` | Skip header comments | — |
| `-p, --progress` | Show progress bar | — |
| `--dry-run` | Preview without writing files | — |
| `--json` | Output results as JSON | — |
### Analyze Options
| Flag | Description | Default |
|------------------|--------------------------------------------|-------------|
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-p, --progress` | Show progress bar | — |
| `--fail-fast` | Stop on first error (for glob patterns) | — |
| `--json` | Output results as JSON | — |
Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).
### Convert Options
| Flag | Description | Default |
|------------------|--------------------------------------------------|-------------|
| `-o, --output` | Output SQL file or directory (required for glob) | stdout |
| `--from` | Source dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `--to` | Target dialect: `mysql`, `postgres`, `sqlite` | required |
| `--strict` | Fail on any unsupported feature | — |
| `-p, --progress` | Show progress bar | — |
| `--dry-run` | Preview without writing files | — |
| `--fail-fast` | Stop on first error (for glob patterns) | — |
| `--json` | Output results as JSON | — |
Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).
**Supported conversions:**
- MySQL ↔ PostgreSQL (including COPY → INSERT)
- MySQL ↔ SQLite
- PostgreSQL ↔ SQLite
**Features:**
- 30+ data type mappings
- AUTO_INCREMENT ↔ SERIAL ↔ INTEGER PRIMARY KEY
- PostgreSQL COPY → INSERT with NULL and escape handling
- Session command stripping
- Warnings for unsupported features (ENUM, arrays, triggers)
### Validate Options
| Flag | Description | Default |
|------------------------|----------------------------------------------------|-------------|
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `--strict` | Treat warnings as errors (exit 1) | — |
| `--json` | Output results as JSON | — |
| `--max-rows-per-table` | Max rows per table for PK/FK checks (0 = no limit) | 1,000,000 |
| `--no-limit` | Disable row limit for PK/FK checks | — |
| `--no-fk-checks` | Skip PK/FK data integrity checks | — |
| `-p, --progress` | Show progress bar | — |
| `--fail-fast` | Stop on first error (for glob patterns) | — |
Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).
**Validation checks:**
- SQL syntax validation (parser errors)
- DDL/DML consistency (INSERTs reference existing tables)
- Encoding validation (UTF-8)
- Duplicate primary key detection (all dialects)
- FK referential integrity (all dialects)
### Sample Options
| Flag | Description | Default |
|-----------------------|----------------------------------------------------|-------------|
| `-o, --output` | Output SQL file | stdout |
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `--percent` | Sample percentage (1-100) | — |
| `--rows` | Sample fixed number of rows per table | — |
| `--preserve-relations`| Preserve FK relationships | — |
| `-t, --tables` | Only sample these tables (comma-separated) | all |
| `-e, --exclude` | Exclude these tables (comma-separated) | — |
| `--root-tables` | Explicit root tables for sampling | — |
| `--include-global` | Global table handling: `none`, `lookups`, `all` | `lookups` |
| `--seed` | Random seed for reproducibility | random |
| `-c, --config` | YAML config file for per-table settings | — |
| `--max-total-rows` | Maximum total rows to sample (0 = no limit) | — |
| `--no-limit` | Disable row limit | — |
| `--strict-fk` | Fail if any FK integrity issues detected | — |
| `--no-schema` | Exclude CREATE TABLE statements from output | — |
| `-p, --progress` | Show progress bar | — |
| `--dry-run` | Preview without writing files | — |
| `--json` | Output results as JSON | — |
### Shard Options
| Flag | Description | Default |
|-----------------------|----------------------------------------------------|-------------|
| `-o, --output` | Output SQL file or directory | stdout |
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `--tenant-column` | Column name for tenant identification | auto-detect |
| `--tenant-value` | Single tenant value to extract | — |
| `--tenant-values` | Multiple tenant values (comma-separated) | — |
| `--root-tables` | Explicit root tables with tenant column | — |
| `--include-global` | Global table handling: `none`, `lookups`, `all` | `lookups` |
| `-c, --config` | YAML config file for table classification | — |
| `--max-selected-rows` | Maximum rows to select (0 = no limit) | — |
| `--no-limit` | Disable row limit | — |
| `--strict-fk` | Fail if any FK integrity issues detected | — |
| `--no-schema` | Exclude CREATE TABLE statements from output | — |
| `-p, --progress` | Show progress bar | — |
| `--dry-run` | Preview without writing files | — |
| `--json` | Output results as JSON | — |
### Diff Options
| Flag | Description | Default |
|---------------------|------------------------------------------------------------|-------------|
| `-o, --output` | Output file (default: stdout) | stdout |
| `-d, --dialect` | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `--schema-only` | Compare schema only, skip data | — |
| `--data-only` | Compare data only, skip schema | — |
| `--format` | Output format: `text`, `json`, `sql` | `text` |
| `-t, --tables` | Only compare these tables (comma-separated) | all |
| `-e, --exclude` | Exclude these tables (comma-separated) | — |
| `--max-pk-entries` | Max PK entries to track (0 = no limit) | 10,000,000 |
| `-v, --verbose` | Show sample PK values for added/removed/modified rows | — |
| `--primary-key` | Override PK for tables (format: `table:col1+col2`) | auto-detect |
| `--ignore-order` | Ignore column order differences in schema comparison | — |
| `--ignore-columns` | Ignore columns matching glob patterns (e.g., `*.updated_at`) | — |
| `--allow-no-pk` | Compare tables without PK using all columns as key | — |
| `-p, --progress` | Show progress bar | — |
**What diff detects:**
- Tables added/removed/modified (columns, types, nullability)
- Primary key changes
- Foreign key changes
- Index changes (CREATE INDEX, inline INDEX/KEY)
- Rows added/removed/modified (via PK-based comparison)
**Output formats:**
- `text`: Human-readable summary with optional PK samples
- `json`: Structured data for automation (includes warnings)
- `sql`: Migration script with ALTER/CREATE INDEX/DROP INDEX statements
## Performance
See [BENCHMARKS.md](BENCHMARKS.md) for detailed comparisons.
## Testing
```bash
# Unit tests
cargo test
# Verify against real-world SQL dumps (MySQL, PostgreSQL, WordPress, etc.)
make verify-realworld
```
## AI Agent Integration
sql-splitter includes documentation optimized for AI agents:
- **[llms.txt](website/llms.txt)** - LLM-friendly documentation following the [llmstxt.org](https://llmstxt.org) specification
- **[Agent Skill](skills/sql-splitter/SKILL.md)** - Claude Code / Amp skill for automatic tool discovery
Install the skill in Claude Code / Amp:
```bash
amp skill add helgesverre/sql-splitter
```
## License
MIT — see [LICENSE.md](LICENSE.md)