sql-splitter 1.10.0

High-performance CLI tool for splitting large SQL dump files into individual table files
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
# sql-splitter

[![Crates.io](https://img.shields.io/crates/v/sql-splitter.svg)](https://crates.io/crates/sql-splitter)
[![Downloads](https://img.shields.io/crates/d/sql-splitter.svg)](https://crates.io/crates/sql-splitter)
[![Build](https://github.com/helgesverre/sql-splitter/actions/workflows/test.yml/badge.svg)](https://github.com/helgesverre/sql-splitter/actions)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE.md)
[![Amp](https://img.shields.io/badge/Amp%20Code-191C19.svg?logo=&style=flat)](https://ampcode.com/@helgesverre)

Split large SQL dump files into individual table files. Fast, memory-efficient, multi-dialect.

- **600+ MB/s** throughput on modern hardware
- **MySQL, PostgreSQL, SQLite** support (including `COPY FROM stdin`)
- **Compressed files** — gzip, bzip2, xz, zstd auto-detected
- **Streaming architecture** — handles files larger than RAM
- **5x faster** than shell-based alternatives

## Installation

### From crates.io

```bash
cargo install sql-splitter
```

### From source

```bash
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install  # Installs binary + shell completions + man pages
```

Or download pre-built binaries from [GitHub Releases](https://github.com/helgesverre/sql-splitter/releases).

### Man pages (optional)

After installation, view documentation with `man sql-splitter` or `man sql-splitter-diff`.

For `cargo install` users, install man pages manually:
```bash
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install-man
```

## Usage

```bash
# MySQL/MariaDB dump (default)
sql-splitter split dump.sql -o tables/

# PostgreSQL pg_dump
sql-splitter split pg_dump.sql -o tables/ --dialect=postgres

# SQLite dump
sql-splitter split sqlite.sql -o tables/ --dialect=sqlite

# Compressed files (auto-detected)
sql-splitter split backup.sql.gz -o tables/
sql-splitter split backup.sql.zst -o tables/

# Split specific tables only
sql-splitter split dump.sql --tables users,posts,orders

# Schema only (CREATE TABLE, indexes, etc.)
sql-splitter split dump.sql -o schema/ --schema-only

# Data only (INSERT/COPY statements)
sql-splitter split dump.sql -o data/ --data-only

# Merge split files back into single dump
sql-splitter merge tables/ -o restored.sql

# Merge specific tables only
sql-splitter merge tables/ -o partial.sql --tables users,orders

# Merge with transaction wrapper
sql-splitter merge tables/ -o restored.sql --transaction

# Analyze without splitting
sql-splitter analyze dump.sql

# Convert between SQL dialects
sql-splitter convert mysql_dump.sql -o postgres_dump.sql --to postgres
sql-splitter convert pg_dump.sql -o mysql_dump.sql --to mysql
sql-splitter convert dump.sql -o sqlite_dump.sql --to sqlite

# Convert with explicit source dialect
sql-splitter convert dump.sql --from postgres --to mysql -o output.sql

# Validate SQL dump integrity
sql-splitter validate dump.sql

# Validate with strict mode (warnings = errors)
sql-splitter validate dump.sql --strict

# Validate with JSON output for CI
sql-splitter validate dump.sql --json

# Batch operations with glob patterns
sql-splitter validate "dumps/*.sql" --fail-fast
sql-splitter analyze "**/*.sql"
sql-splitter split "*.sql" -o output/
sql-splitter convert "*.sql" --to postgres -o converted/

# Compare two SQL dumps for changes
sql-splitter diff old.sql new.sql

# Diff with schema-only or data-only
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only

# Diff with JSON or SQL migration output
sql-splitter diff old.sql new.sql --format json -o diff.json
sql-splitter diff old.sql new.sql --format sql -o migration.sql

# Diff with verbose PK samples and ignore timestamp columns
sql-splitter diff old.sql new.sql --verbose --ignore-columns "*.updated_at,*.created_at"

# Override primary key for tables without PK
sql-splitter diff old.sql new.sql --primary-key logs:timestamp+message

# Redact sensitive data using inline patterns
sql-splitter redact dump.sql -o safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"

# Redact using YAML config file
sql-splitter redact dump.sql -o safe.sql --config redact.yaml

# Generate redaction config by analyzing input file
sql-splitter redact dump.sql --generate-config -o redact.yaml

# Reproducible redaction with seed
sql-splitter redact dump.sql -o safe.sql --null "*.password" --seed 42

# Generate shell completions (auto-installed with make install)
sql-splitter completions bash >> ~/.bashrc
sql-splitter completions zsh >> ~/.zshrc
sql-splitter completions fish >> ~/.config/fish/completions/sql-splitter.fish
```

### Shell Completions

Shell completions are automatically installed when using `make install`. For manual installation:

```bash
# Install for current shell only
make install-completions

# Install for all shells (bash, zsh, fish)
make install-completions-all
```

## Why sql-splitter?

sql-splitter is a **dump-first, CLI-first** tool designed for automation and CI/CD pipelines.

**What it's optimized for**

| Strength                       | Description                                                                        |
|--------------------------------|------------------------------------------------------------------------------------|
| **One tool for the workflow**  | Split → sample → shard → convert → merge in a single binary                        |
| **Works on dump files**        | No running database or JDBC connection needed (unlike mydumper, Jailer, Condenser) |
| **Streaming architecture**     | 10GB+ dumps with constant memory, 600+ MB/s throughput                             |
| **Multi-dialect + conversion** | MySQL, PostgreSQL, SQLite including `COPY FROM stdin` → INSERT                     |
| **FK-aware operations**        | Sampling and tenant sharding preserve referential integrity                        |

**When another tool might be better**

- **[mydumper](https://github.com/mydumper/mydumper)** — Parallel snapshots from live MySQL/MariaDB databases
- **[Jailer](https://github.com/Wisser/Jailer)** — Rich GUI-based FK subsetting with JDBC across 12+ databases
- **[sqlglot](https://github.com/tobymao/sqlglot)** — Query-level transpilation and AST manipulation (31 dialects)
- **[DuckDB](https://github.com/duckdb/duckdb)** — Complex analytical queries over SQL/CSV/JSON/Parquet

See [docs/COMPETITIVE_ANALYSIS.md](docs/COMPETITIVE_ANALYSIS.md) for detailed comparisons.

## Options

### Split Options

| Flag             | Description                                | Default     |
|------------------|--------------------------------------------|-------------|
| `-o, --output`   | Output directory                           | `output`    |
| `-d, --dialect`  | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-t, --tables`   | Only split these tables (comma-separated)  | —           |
| `-p, --progress` | Show progress bar                          | —           |
| `--dry-run`      | Preview without writing files              | —           |
| `--schema-only`  | Only DDL statements (CREATE, ALTER, DROP)  | —           |
| `--data-only`    | Only DML statements (INSERT, COPY)         | —           |
| `--fail-fast`    | Stop on first error (for glob patterns)    | —           |
| `--json`         | Output results as JSON                     | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

### Merge Options

| Flag             | Description                               | Default |
|------------------|-------------------------------------------|---------|
| `-o, --output`   | Output SQL file                           | stdout  |
| `-d, --dialect`  | SQL dialect for headers/footers           | `mysql` |
| `-t, --tables`   | Only merge these tables (comma-separated) | all     |
| `-e, --exclude`  | Exclude these tables (comma-separated)    | —       |
| `--transaction`  | Wrap in BEGIN/COMMIT transaction          | —       |
| `--no-header`    | Skip header comments                      | —       |
| `-p, --progress` | Show progress bar                         | —       |
| `--dry-run`      | Preview without writing files             | —       |
| `--json`         | Output results as JSON                    | —       |

### Analyze Options

| Flag             | Description                                | Default     |
|------------------|--------------------------------------------|-------------|
| `-d, --dialect`  | SQL dialect: `mysql`, `postgres`, `sqlite` | auto-detect |
| `-p, --progress` | Show progress bar                          | —           |
| `--fail-fast`    | Stop on first error (for glob patterns)    | —           |
| `--json`         | Output results as JSON                     | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

### Convert Options

| Flag             | Description                                      | Default     |
|------------------|--------------------------------------------------|-------------|
| `-o, --output`   | Output SQL file or directory (required for glob) | stdout      |
| `--from`         | Source dialect: `mysql`, `postgres`, `sqlite`    | auto-detect |
| `--to`           | Target dialect: `mysql`, `postgres`, `sqlite`    | required    |
| `--strict`       | Fail on any unsupported feature                  | —           |
| `-p, --progress` | Show progress bar                                | —           |
| `--dry-run`      | Preview without writing files                    | —           |
| `--fail-fast`    | Stop on first error (for glob patterns)          | —           |
| `--json`         | Output results as JSON                           | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

**Supported conversions:**

- MySQL ↔ PostgreSQL (including COPY → INSERT)
- MySQL ↔ SQLite
- PostgreSQL ↔ SQLite

**Features:**

- 30+ data type mappings
- AUTO_INCREMENT ↔ SERIAL ↔ INTEGER PRIMARY KEY
- PostgreSQL COPY → INSERT with NULL and escape handling
- Session command stripping
- Warnings for unsupported features (ENUM, arrays, triggers)

### Validate Options

| Flag                   | Description                                        | Default     |
|------------------------|----------------------------------------------------|-------------|
| `-d, --dialect`        | SQL dialect: `mysql`, `postgres`, `sqlite`         | auto-detect |
| `--strict`             | Treat warnings as errors (exit 1)                  | —           |
| `--json`               | Output results as JSON                             | —           |
| `--max-rows-per-table` | Max rows per table for PK/FK checks (0 = no limit) | 1,000,000   |
| `--no-limit`           | Disable row limit for PK/FK checks                 | —           |
| `--no-fk-checks`       | Skip PK/FK data integrity checks                   | —           |
| `-p, --progress`       | Show progress bar                                  | —           |
| `--fail-fast`          | Stop on first error (for glob patterns)            | —           |

Input can be a file path or glob pattern (e.g., `*.sql`, `dumps/**/*.sql`).

**Validation checks:**

- SQL syntax validation (parser errors)
- DDL/DML consistency (INSERTs reference existing tables)
- Encoding validation (UTF-8)
- Duplicate primary key detection (all dialects)
- FK referential integrity (all dialects)

### Sample Options

| Flag                  | Description                                        | Default     |
|-----------------------|----------------------------------------------------|-------------|
| `-o, --output`        | Output SQL file                                    | stdout      |
| `-d, --dialect`       | SQL dialect: `mysql`, `postgres`, `sqlite`         | auto-detect |
| `--percent`           | Sample percentage (1-100)                          | —           |
| `--rows`              | Sample fixed number of rows per table              | —           |
| `--preserve-relations`| Preserve FK relationships                          | —           |
| `-t, --tables`        | Only sample these tables (comma-separated)         | all         |
| `-e, --exclude`       | Exclude these tables (comma-separated)             | —           |
| `--root-tables`       | Explicit root tables for sampling                  | —           |
| `--include-global`    | Global table handling: `none`, `lookups`, `all`    | `lookups`   |
| `--seed`              | Random seed for reproducibility                    | random      |
| `-c, --config`        | YAML config file for per-table settings            | —           |
| `--max-total-rows`    | Maximum total rows to sample (0 = no limit)        | —           |
| `--no-limit`          | Disable row limit                                  | —           |
| `--strict-fk`         | Fail if any FK integrity issues detected           | —           |
| `--no-schema`         | Exclude CREATE TABLE statements from output        | —           |
| `-p, --progress`      | Show progress bar                                  | —           |
| `--dry-run`           | Preview without writing files                      | —           |
| `--json`              | Output results as JSON                             | —           |

### Shard Options

| Flag                  | Description                                        | Default     |
|-----------------------|----------------------------------------------------|-------------|
| `-o, --output`        | Output SQL file or directory                       | stdout      |
| `-d, --dialect`       | SQL dialect: `mysql`, `postgres`, `sqlite`         | auto-detect |
| `--tenant-column`     | Column name for tenant identification              | auto-detect |
| `--tenant-value`      | Single tenant value to extract                     | —           |
| `--tenant-values`     | Multiple tenant values (comma-separated)           | —           |
| `--root-tables`       | Explicit root tables with tenant column            | —           |
| `--include-global`    | Global table handling: `none`, `lookups`, `all`    | `lookups`   |
| `-c, --config`        | YAML config file for table classification          | —           |
| `--max-selected-rows` | Maximum rows to select (0 = no limit)              | —           |
| `--no-limit`          | Disable row limit                                  | —           |
| `--strict-fk`         | Fail if any FK integrity issues detected           | —           |
| `--no-schema`         | Exclude CREATE TABLE statements from output        | —           |
| `-p, --progress`      | Show progress bar                                  | —           |
| `--dry-run`           | Preview without writing files                      | —           |
| `--json`              | Output results as JSON                             | —           |

### Diff Options

| Flag                | Description                                                | Default     |
|---------------------|------------------------------------------------------------|-------------|
| `-o, --output`      | Output file (default: stdout)                              | stdout      |
| `-d, --dialect`     | SQL dialect: `mysql`, `postgres`, `sqlite`                 | auto-detect |
| `--schema-only`     | Compare schema only, skip data                             | —           |
| `--data-only`       | Compare data only, skip schema                             | —           |
| `--format`          | Output format: `text`, `json`, `sql`                       | `text`      |
| `-t, --tables`      | Only compare these tables (comma-separated)                | all         |
| `-e, --exclude`     | Exclude these tables (comma-separated)                     | —           |
| `--max-pk-entries`  | Max PK entries to track (0 = no limit)                     | 10,000,000  |
| `-v, --verbose`     | Show sample PK values for added/removed/modified rows      | —           |
| `--primary-key`     | Override PK for tables (format: `table:col1+col2`)         | auto-detect |
| `--ignore-order`    | Ignore column order differences in schema comparison       | —           |
| `--ignore-columns`  | Ignore columns matching glob patterns (e.g., `*.updated_at`) | —         |
| `--allow-no-pk`     | Compare tables without PK using all columns as key         | —           |
| `-p, --progress`    | Show progress bar                                          | —           |

**What diff detects:**

- Tables added/removed/modified (columns, types, nullability)
- Primary key changes
- Foreign key changes
- Index changes (CREATE INDEX, inline INDEX/KEY)
- Rows added/removed/modified (via PK-based comparison)

**Output formats:**

- `text`: Human-readable summary with optional PK samples
- `json`: Structured data for automation (includes warnings)
- `sql`: Migration script with ALTER/CREATE INDEX/DROP INDEX statements

### Redact Options

| Flag                | Description                                                | Default     |
|---------------------|------------------------------------------------------------|-------------|
| `-o, --output`      | Output SQL file                                            | stdout      |
| `-d, --dialect`     | SQL dialect: `mysql`, `postgres`, `sqlite`                 | auto-detect |
| `-c, --config`      | YAML config file for redaction rules                       | —           |
| `--generate-config` | Analyze input and generate annotated YAML config           | —           |
| `--null`            | Columns to set to NULL (glob patterns, comma-separated)    | —           |
| `--hash`            | Columns to hash with SHA256 (glob patterns)                | —           |
| `--fake`            | Columns to replace with fake data (glob patterns)          | —           |
| `--mask`            | Columns to partially mask (format: `pattern=column`)       | —           |
| `--constant`        | Column=value pairs for constant replacement                | —           |
| `--seed`            | Random seed for reproducible redaction                     | random      |
| `--locale`          | Locale for fake data (en, de_de, fr_fr, etc.)              | `en`        |
| `-t, --tables`      | Only redact specific tables (comma-separated)              | all         |
| `-e, --exclude`     | Exclude specific tables (comma-separated)                  | —           |
| `--strict`          | Fail on warnings (e.g., unsupported locale)                | —           |
| `-p, --progress`    | Show progress bar                                          | —           |
| `--dry-run`         | Preview without writing files                              | —           |
| `--json`            | Output results as JSON                                     | —           |
| `--validate`        | Validate config only, don't process                        | —           |

**Redaction strategies:**

- `null`: Replace value with NULL
- `constant`: Replace with fixed value
- `hash`: SHA256 hash (deterministic, preserves FK relationships)
- `mask`: Partial masking with pattern (`*`=asterisk, `X`=keep, `#`=random digit)
- `fake`: Generate realistic fake data (25+ generators)
- `shuffle`: Redistribute values within column (preserves distribution)
- `skip`: No redaction (passthrough)

**Fake data generators:**

`email`, `name`, `first_name`, `last_name`, `phone`, `address`, `city`, `state`, `zip`, `country`, `company`, `job_title`, `username`, `url`, `ip`, `ipv6`, `uuid`, `date`, `datetime`, `credit_card`, `iban`, `ssn`, `lorem`, `paragraph`, `sentence`

## Performance

See [BENCHMARKS.md](BENCHMARKS.md) for detailed comparisons.

## Testing

```bash
# Unit tests
cargo test

# Verify against real-world SQL dumps (MySQL, PostgreSQL, WordPress, etc.)
make verify-realworld
```

## AI Agent Integration

sql-splitter includes documentation optimized for AI agents:

- **[llms.txt](website/llms.txt)** - LLM-friendly documentation following the [llmstxt.org](https://llmstxt.org) specification
- **[Agent Skill](skills/sql-splitter/SKILL.md)** - Claude Code / Amp skill for automatic tool discovery

Install the skill in Claude Code / Amp:
```bash
amp skill add helgesverre/sql-splitter
```

## License

MIT — see [LICENSE.md](LICENSE.md)