sql-splitter 1.10.0

# sql-splitter

[![Crates.io](https://img.shields.io/crates/v/sql-splitter.svg)](https://crates.io/crates/sql-splitter)
[![Downloads](https://img.shields.io/crates/d/sql-splitter.svg)](https://crates.io/crates/sql-splitter)
[![Build](https://github.com/helgesverre/sql-splitter/actions/workflows/test.yml/badge.svg)](https://github.com/helgesverre/sql-splitter/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE.md)
[![Amp](https://img.shields.io/badge/Amp%20Code-191C19.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMjEiIGhlaWdodD0iMjEiIHZpZXdCb3g9IjAgMCAyMSAyMSIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPHBhdGggZD0iTTMuNzY4NzkgMTguMzAxNUw4LjQ5ODM5IDEzLjUwNUwxMC4yMTk2IDIwLjAzOTlMMTIuNzIgMTkuMzU2MUwxMC4yMjg4IDkuODY3NDlMMC44OTA4NzYgNy4zMzg0NEwwLjIyNTk0IDkuODkzMzFMNi42NTEzNCAxMS42Mzg4TDEuOTQxMzggMTYuNDI4MkwzLjc2ODc5IDE4LjMwMTVaIiBmaWxsPSIjRjM0RTNGIi8+CjxwYXRoIGQ9Ik0xNy40MDc0IDEyLjc0MTRMMTkuOTA3OCAxMi4wNTc1TDE3LjQxNjcgMi41Njg5N0w4LjA3ODczIDAuMDM5OTI0Nkw3LjQxMzggMi41OTQ4TDE1LjI5OTIgNC43MzY4NUwxNy40MDc0IDEyLjc0MTRaIiBmaWxsPSIjRjM0RTNGIi8+CjxwYXRoIGQ9Ik0xMy44MTg0IDE2LjM4ODNMMTYuMzE4OCAxNS43MDQ0TDEzLjgyNzYgNi4yMTU4OEw0LjQ4OTcxIDMuNjg2ODNMMy44MjQ3NyA2LjI0MTcxTDExLjcxMDEgOC4zODM3NkwxMy44MTg0IDE2LjM4ODNaIiBmaWxsPSIjRjM0RTNGIi8+Cjwvc3ZnPg==&style=flat)](https://ampcode.com/@helgesverre)

Split large SQL dump files into individual table files. Fast, memory-efficient, multi-dialect.

- **600+ MB/s** throughput on modern hardware
- **MySQL, PostgreSQL, SQLite** support (including `COPY FROM stdin`)
- **Compressed files** — gzip, bzip2, xz, zstd auto-detected
- **Streaming architecture** — handles files larger than RAM
- **5x faster** than shell-based alternatives

## Installation

### From crates.io

```bash
cargo install sql-splitter
```

### From source

```bash
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install  # Installs binary + shell completions + man pages
```

Or download pre-built binaries from [GitHub Releases](https://github.com/helgesverre/sql-splitter/releases).

### Man pages (optional)

After installation, view documentation with `man sql-splitter` or `man sql-splitter-diff`.

For `cargo install` users, install man pages manually:
```bash
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install-man
```

## Usage

```bash
# MySQL/MariaDB dump (default)
sql-splitter split dump.sql -o tables/

# PostgreSQL pg_dump
sql-splitter split pg_dump.sql -o tables/ --dialect=postgres

# SQLite dump
sql-splitter split sqlite.sql -o tables/ --dialect=sqlite

# Compressed files (auto-detected)
sql-splitter split backup.sql.gz -o tables/
sql-splitter split backup.sql.zst -o tables/

# Split specific tables only
sql-splitter split dump.sql --tables users,posts,orders

# Schema only (CREATE TABLE, indexes, etc.)
sql-splitter split dump.sql -o schema/ --schema-only

# Data only (INSERT/COPY statements)
sql-splitter split dump.sql -o data/ --data-only

# Merge split files back into single dump
sql-splitter merge tables/ -o restored.sql

# Merge specific tables only
sql-splitter merge tables/ -o partial.sql --tables users,orders

# Merge with transaction wrapper
sql-splitter merge tables/ -o restored.sql --transaction

# Analyze without splitting
sql-splitter analyze dump.sql

# Convert between SQL dialects
sql-splitter convert mysql_dump.sql -o postgres_dump.sql --to postgres
sql-splitter convert pg_dump.sql -o mysql_dump.sql --to mysql
sql-splitter convert dump.sql -o sqlite_dump.sql --to sqlite

# Convert with explicit source dialect
sql-splitter convert dump.sql --from postgres --to mysql -o output.sql

# Validate SQL dump integrity
sql-splitter validate dump.sql

# Validate with strict mode (warnings = errors)
sql-splitter validate dump.sql --strict

# Validate with JSON output for CI
sql-splitter validate dump.sql --json

# Batch operations with glob patterns
sql-splitter validate "dumps/*.sql" --fail-fast
sql-splitter analyze "**/*.sql"
sql-splitter split "*.sql" -o output/
sql-splitter convert "*.sql" --to postgres -o converted/

# Compare two SQL dumps for changes
sql-splitter diff old.sql new.sql

# Diff with schema-only or data-only
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only

# Diff with JSON or SQL migration output
sql-splitter diff old.sql new.sql --format json -o diff.json
sql-splitter diff old.sql new.sql --format sql -o migration.sql

# Diff with verbose PK samples and ignore timestamp columns
sql-splitter diff old.sql new.sql --verbose --ignore-columns "*.updated_at,*.created_at"

# Override primary key for tables without PK
sql-splitter diff old.sql new.sql --primary-key logs:timestamp+message

# Redact sensitive data using inline patterns
sql-splitter redact dump.sql -o safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"

# Redact using YAML config file
sql-splitter redact dump.sql -o safe.sql --config redact.yaml

# Generate redaction config by analyzing input file
sql-splitter redact dump.sql --generate-config -o redact.yaml

# Reproducible redaction with seed
sql-splitter redact dump.sql -o safe.sql --null "*.password" --seed 42

# Generate shell completions (auto-installed with make install)
sql-splitter completions bash >> ~/.bashrc
sql-splitter completions zsh >> ~/.zshrc
sql-splitter completions fish >> ~/.config/fish/completions/sql-splitter.fish
```

### Shell Completions

Shell completions are automatically installed when using `make install`. For manual installation:

```bash
# Install for current shell only
make install-completions

# Install for all shells (bash, zsh, fish)
make install-completions-all
```

## Why sql-splitter?

sql-splitter is a **dump-first, CLI-first** tool designed for automation and CI/CD pipelines.

**What it's optimized for**

| Strength                       | Description                                                                        |
|--------------------------------|------------------------------------------------------------------------------------|
| **One tool for the workflow**  | Split → sample → shard → convert → merge in a single binary                        |
| **Works on dump files**        | No running database or JDBC connection needed (unlike mydumper, Jailer, Condenser) |
| **Streaming architecture**     | 10GB+ dumps with constant memory, 600+ MB/s throughput                             |
| **Multi-dialect + conversion** | MySQL, PostgreSQL, SQLite including `COPY FROM stdin` → INSERT                     |
| **FK-aware operations**        | Sampling and tenant sharding preserve referential integrity                        |

**When another tool might be better**

- **[mydumper](https://github.com/mydumper/mydumper)** — Parallel snapshots from live MySQL/MariaDB databases
- **[Jailer](https://github.com/Wisser/Jailer)** — Rich GUI-based FK subsetting with JDBC across 12+ databases
- **[sqlglot](https://github.com/tobymao/sqlglot)** — Query-level transpilation and AST manipulation (31 dialects)
- **[DuckDB](https://github.com/duckdb/duckdb)** — Complex analytical queries over SQL/CSV/JSON/Parquet

See [docs/COMPETITIVE_ANALYSIS.md](docs/COMPETITIVE_ANALYSIS.md) for detailed comparisons.

## Options

### Split Options

| Flag             | Description                                | Default     |
|------------------|--------------------------------------------|-------------|
| `-o, --output`   | Output directory                           | `output`    |
| `-d, --dialect`  | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-t, --tables`   | Only split these tables (comma-separated)  | —           |
| `-p, --progress` | Show progress bar                          | —           |
| `--dry-run`      | Preview without writing files              | —           |
| `--schema-only`  | Only DDL statements (CREATE, ALTER, DROP)  | —           |
| `--data-only`    | Only DML statements (INSERT, COPY)         | —           |
| `--fail-fast`    | Stop on first error (for glob patterns)    | —           |
| `--json`         | Output results as JSON                     | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

### Merge Options

| Flag             | Description                               | Default |
|------------------|-------------------------------------------|---------|
| `-o, --output`   | Output SQL file                           | stdout  |
| `-d, --dialect`  | SQL dialect for headers/footers           | `mysql` |
| `-t, --tables`   | Only merge these tables (comma-separated) | all     |
| `-e, --exclude`  | Exclude these tables (comma-separated)    | —       |
| `--transaction`  | Wrap in BEGIN/COMMIT transaction          | —       |
| `--no-header`    | Skip header comments                      | —       |
| `-p, --progress` | Show progress bar                         | —       |
| `--dry-run`      | Preview without writing files             | —       |
| `--json`         | Output results as JSON                    | —       |

### Analyze Options

| Flag             | Description                                | Default     |
|------------------|--------------------------------------------|-------------|
| `-d, --dialect`  | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-p, --progress` | Show progress bar                          | —           |
| `--fail-fast`    | Stop on first error (for glob patterns)    | —           |
| `--json`         | Output results as JSON                     | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

### Convert Options

| Flag             | Description                                      | Default     |
|------------------|--------------------------------------------------|-------------|
| `-o, --output`   | Output SQL file or directory (required for glob) | stdout      |
| `--from`         | Source dialect: `mysql`, `postgres`, `sqlite`    | auto-detect |
| `--to`           | Target dialect: `mysql`, `postgres`, `sqlite`    | required    |
| `--strict`       | Fail on any unsupported feature                  | —           |
| `-p, --progress` | Show progress bar                                | —           |
| `--dry-run`      | Preview without writing files                    | —           |
| `--fail-fast`    | Stop on first error (for glob patterns)          | —           |
| `--json`         | Output results as JSON                           | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

**Supported conversions:**

- MySQL ↔ PostgreSQL (including COPY → INSERT)
- MySQL ↔ SQLite
- PostgreSQL ↔ SQLite

**Features:**

- 30+ data type mappings
- AUTO_INCREMENT ↔ SERIAL ↔ INTEGER PRIMARY KEY
- PostgreSQL COPY → INSERT with NULL and escape handling
- Session command stripping
- Warnings for unsupported features (ENUM, arrays, triggers)

### Validate Options

| Flag                   | Description                                        | Default     |
|------------------------|----------------------------------------------------|-------------|
| `-d, --dialect`        | SQL dialect: `mysql`, `postgres`, `sqlite`         | auto-detect |
| `--strict`             | Treat warnings as errors (exit 1)                  | —           |
| `--json`               | Output results as JSON                             | —           |
| `--max-rows-per-table` | Max rows per table for PK/FK checks (0 = no limit) | 1,000,000   |
| `--no-limit`           | Disable row limit for PK/FK checks                 | —           |
| `--no-fk-checks`       | Skip PK/FK data integrity checks                   | —           |
| `-p, --progress`       | Show progress bar                                  | —           |
| `--fail-fast`          | Stop on first error (for glob patterns)            | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

**Validation checks:**

- SQL syntax validation (parser errors)
- DDL/DML consistency (INSERTs reference existing tables)
- Encoding validation (UTF-8)
- Duplicate primary key detection (all dialects)
- FK referential integrity (all dialects)

### Sample Options

| Flag                  | Description                                        | Default     |
|-----------------------|----------------------------------------------------|-------------|
| `-o, --output`        | Output SQL file                                    | stdout      |
| `-d, --dialect`       | SQL dialect: `mysql`, `postgres`, `sqlite`         | auto-detect |
| `--percent`           | Sample percentage (1-100)                          | —           |
| `--rows`              | Sample fixed number of rows per table              | —           |
| `--preserve-relations`| Preserve FK relationships                          | —           |
| `-t, --tables`        | Only sample these tables (comma-separated)         | all         |
| `-e, --exclude`       | Exclude these tables (comma-separated)             | —           |
| `--root-tables`       | Explicit root tables for sampling                  | —           |
| `--include-global`    | Global table handling: `none`, `lookups`, `all`    | `lookups`   |
| `--seed`              | Random seed for reproducibility                    | random      |
| `-c, --config`        | YAML config file for per-table settings            | —           |
| `--max-total-rows`    | Maximum total rows to sample (0 = no limit)        | —           |
| `--no-limit`          | Disable row limit                                  | —           |
| `--strict-fk`         | Fail if any FK integrity issues detected           | —           |
| `--no-schema`         | Exclude CREATE TABLE statements from output        | —           |
| `-p, --progress`      | Show progress bar                                  | —           |
| `--dry-run`           | Preview without writing files                      | —           |
| `--json`              | Output results as JSON                             | —           |

### Shard Options

| Flag                  | Description                                        | Default     |
|-----------------------|----------------------------------------------------|-------------|
| `-o, --output`        | Output SQL file or directory                       | stdout      |
| `-d, --dialect`       | SQL dialect: `mysql`, `postgres`, `sqlite`         | auto-detect |
| `--tenant-column`     | Column name for tenant identification              | auto-detect |
| `--tenant-value`      | Single tenant value to extract                     | —           |
| `--tenant-values`     | Multiple tenant values (comma-separated)           | —           |
| `--root-tables`       | Explicit root tables with tenant column            | —           |
| `--include-global`    | Global table handling: `none`, `lookups`, `all`    | `lookups`   |
| `-c, --config`        | YAML config file for table classification          | —           |
| `--max-selected-rows` | Maximum rows to select (0 = no limit)              | —           |
| `--no-limit`          | Disable row limit                                  | —           |
| `--strict-fk`         | Fail if any FK integrity issues detected           | —           |
| `--no-schema`         | Exclude CREATE TABLE statements from output        | —           |
| `-p, --progress`      | Show progress bar                                  | —           |
| `--dry-run`           | Preview without writing files                      | —           |
| `--json`              | Output results as JSON                             | —           |

### Diff Options

| Flag                | Description                                                | Default     |
|---------------------|------------------------------------------------------------|-------------|
| `-o, --output`      | Output file (default: stdout)                              | stdout      |
| `-d, --dialect`     | SQL dialect: `mysql`, `postgres`, `sqlite`                 | auto-detect |
| `--schema-only`     | Compare schema only, skip data                             | —           |
| `--data-only`       | Compare data only, skip schema                             | —           |
| `--format`          | Output format: `text`, `json`, `sql`                       | `text`      |
| `-t, --tables`      | Only compare these tables (comma-separated)                | all         |
| `-e, --exclude`     | Exclude these tables (comma-separated)                     | —           |
| `--max-pk-entries`  | Max PK entries to track (0 = no limit)                     | 10,000,000  |
| `-v, --verbose`     | Show sample PK values for added/removed/modified rows      | —           |
| `--primary-key`     | Override PK for tables (format: `table:col1+col2`)         | auto-detect |
| `--ignore-order`    | Ignore column order differences in schema comparison       | —           |
| `--ignore-columns`  | Ignore columns matching glob patterns (e.g., `*.updated_at`) | —         |
| `--allow-no-pk`     | Compare tables without PK using all columns as key         | —           |
| `-p, --progress`    | Show progress bar                                          | —           |

**What diff detects:**

- Tables added/removed/modified (columns, types, nullability)
- Primary key changes
- Foreign key changes
- Index changes (CREATE INDEX, inline INDEX/KEY)
- Rows added/removed/modified (via PK-based comparison)

**Output formats:**

- `text`: Human-readable summary with optional PK samples
- `json`: Structured data for automation (includes warnings)
- `sql`: Migration script with ALTER/CREATE INDEX/DROP INDEX statements

### Redact Options

| Flag                | Description                                                | Default     |
|---------------------|------------------------------------------------------------|-------------|
| `-o, --output`      | Output SQL file                                            | stdout      |
| `-d, --dialect`     | SQL dialect: `mysql`, `postgres`, `sqlite`                 | auto-detect |
| `-c, --config`      | YAML config file for redaction rules                       | —           |
| `--generate-config` | Analyze input and generate annotated YAML config           | —           |
| `--null`            | Columns to set to NULL (glob patterns, comma-separated)    | —           |
| `--hash`            | Columns to hash with SHA256 (glob patterns)                | —           |
| `--fake`            | Columns to replace with fake data (glob patterns)          | —           |
| `--mask`            | Columns to partially mask (format: `pattern=column`)       | —           |
| `--constant`        | Column=value pairs for constant replacement                | —           |
| `--seed`            | Random seed for reproducible redaction                     | random      |
| `--locale`          | Locale for fake data (en, de_de, fr_fr, etc.)              | `en`        |
| `-t, --tables`      | Only redact specific tables (comma-separated)              | all         |
| `-e, --exclude`     | Exclude specific tables (comma-separated)                  | —           |
| `--strict`          | Fail on warnings (e.g., unsupported locale)                | —           |
| `-p, --progress`    | Show progress bar                                          | —           |
| `--dry-run`         | Preview without writing files                              | —           |
| `--json`            | Output results as JSON                                     | —           |
| `--validate`        | Validate config only, don't process                        | —           |

**Redaction strategies:**

- `null`: Replace value with NULL
- `constant`: Replace with fixed value
- `hash`: SHA256 hash (deterministic, preserves FK relationships)
- `mask`: Partial masking with pattern (`*`=asterisk, `X`=keep, `#`=random digit)
- `fake`: Generate realistic fake data (25+ generators)
- `shuffle`: Redistribute values within column (preserves distribution)
- `skip`: No redaction (passthrough)

**Fake data generators:**

`email`, `name`, `first_name`, `last_name`, `phone`, `address`, `city`, `state`, `zip`, `country`, `company`, `job_title`, `username`, `url`, `ip`, `ipv6`, `uuid`, `date`, `datetime`, `credit_card`, `iban`, `ssn`, `lorem`, `paragraph`, `sentence`

## Performance

See [BENCHMARKS.md](BENCHMARKS.md) for detailed comparisons.

## Testing

```bash
# Unit tests
cargo test

# Verify against real-world SQL dumps (MySQL, PostgreSQL, WordPress, etc.)
make verify-realworld
```

## AI Agent Integration

sql-splitter includes documentation optimized for AI agents:

- **[llms.txt](website/llms.txt)** - LLM-friendly documentation following the [llmstxt.org](https://llmstxt.org) specification
- **[Agent Skill](skills/sql-splitter/SKILL.md)** - Claude Code / Amp skill for automatic tool discovery

Install the skill in Claude Code / Amp:
```bash
amp skill add helgesverre/sql-splitter
```

## License

MIT — see [LICENSE.md](LICENSE.md)