sql-splitter
Split large SQL dump files into individual table files. Fast, memory-efficient, multi-dialect.
- 600+ MB/s throughput on modern hardware
- MySQL, PostgreSQL, SQLite support (including
COPY FROM stdin) - Compressed files — gzip, bzip2, xz, zstd auto-detected
- Streaming architecture — handles files larger than RAM
- 5x faster than shell-based alternatives
Installation
From crates.io
From source
Or download pre-built binaries from GitHub Releases.
Usage
# MySQL/MariaDB dump (default)
# PostgreSQL pg_dump
# SQLite dump
# Compressed files (auto-detected)
# Split specific tables only
# Schema only (CREATE TABLE, indexes, etc.)
# Data only (INSERT/COPY statements)
# Merge split files back into single dump
# Merge specific tables only
# Merge with transaction wrapper
# Analyze without splitting
# Convert between SQL dialects
# Convert with explicit source dialect
# Validate SQL dump integrity
# Validate with strict mode (warnings = errors)
# Validate with JSON output for CI
# Batch operations with glob patterns
# Generate shell completions (auto-installed with make install)
Shell Completions
Shell completions are automatically installed when using make install. For manual installation:
# Install for current shell only
# Install for all shells (bash, zsh, fish)
Why sql-splitter?
sql-splitter is a dump-first, CLI-first tool designed for automation and CI/CD pipelines.
What it's optimized for
| Strength | Description |
|---|---|
| One tool for the workflow | Split → sample → shard → convert → merge in a single binary |
| Works on dump files | No running database or JDBC connection needed (unlike mydumper, Jailer, Condenser) |
| Streaming architecture | 10GB+ dumps with constant memory, 600+ MB/s throughput |
| Multi-dialect + conversion | MySQL, PostgreSQL, SQLite including COPY FROM stdin → INSERT |
| FK-aware operations | Sampling and tenant sharding preserve referential integrity |
When another tool might be better
- mydumper — Parallel snapshots from live MySQL/MariaDB databases
- Jailer — Rich GUI-based FK subsetting with JDBC across 12+ databases
- sqlglot — Query-level transpilation and AST manipulation (31 dialects)
- DuckDB — Complex analytical queries over SQL/CSV/JSON/Parquet
See docs/COMPETITIVE_ANALYSIS.md for detailed comparisons.
Options
Split Options
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output directory | output |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite |
auto-detect |
-t, --tables |
Only split these tables (comma-separated) | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--schema-only |
Only DDL statements (CREATE, ALTER, DROP) | — |
--data-only |
Only DML statements (INSERT, COPY) | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
--json |
Output results as JSON | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
Merge Options
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file | stdout |
-d, --dialect |
SQL dialect for headers/footers | mysql |
-t, --tables |
Only merge these tables (comma-separated) | all |
-e, --exclude |
Exclude these tables (comma-separated) | — |
--transaction |
Wrap in BEGIN/COMMIT transaction | — |
--no-header |
Skip header comments | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--json |
Output results as JSON | — |
Analyze Options
| Flag | Description | Default |
|---|---|---|
-d, --dialect |
SQL dialect: mysql, postgres, sqlite |
auto-detect |
-p, --progress |
Show progress bar | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
--json |
Output results as JSON | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
Convert Options
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file or directory (required for glob) | stdout |
--from |
Source dialect: mysql, postgres, sqlite |
auto-detect |
--to |
Target dialect: mysql, postgres, sqlite |
required |
--strict |
Fail on any unsupported feature | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
--json |
Output results as JSON | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
Supported conversions:
- MySQL ↔ PostgreSQL (including COPY → INSERT)
- MySQL ↔ SQLite
- PostgreSQL ↔ SQLite
Features:
- 30+ data type mappings
- AUTO_INCREMENT ↔ SERIAL ↔ INTEGER PRIMARY KEY
- PostgreSQL COPY → INSERT with NULL and escape handling
- Session command stripping
- Warnings for unsupported features (ENUM, arrays, triggers)
Validate Options
| Flag | Description | Default |
|---|---|---|
-d, --dialect |
SQL dialect: mysql, postgres, sqlite |
auto-detect |
--strict |
Treat warnings as errors (exit 1) | — |
--json |
Output results as JSON | — |
--max-rows-per-table |
Max rows per table for PK/FK checks (0 = no limit) | 1,000,000 |
--no-limit |
Disable row limit for PK/FK checks | — |
--no-fk-checks |
Skip PK/FK data integrity checks | — |
-p, --progress |
Show progress bar | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
Validation checks:
- SQL syntax validation (parser errors)
- DDL/DML consistency (INSERTs reference existing tables)
- Encoding validation (UTF-8)
- Duplicate primary key detection (all dialects)
- FK referential integrity (all dialects)
Sample Options
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite |
auto-detect |
--percent |
Sample percentage (1-100) | — |
--rows |
Sample fixed number of rows per table | — |
--preserve-relations |
Preserve FK relationships | — |
-t, --tables |
Only sample these tables (comma-separated) | all |
-e, --exclude |
Exclude these tables (comma-separated) | — |
--root-tables |
Explicit root tables for sampling | — |
--include-global |
Global table handling: none, lookups, all |
lookups |
--seed |
Random seed for reproducibility | random |
-c, --config |
YAML config file for per-table settings | — |
--max-total-rows |
Maximum total rows to sample (0 = no limit) | — |
--no-limit |
Disable row limit | — |
--strict-fk |
Fail if any FK integrity issues detected | — |
--no-schema |
Exclude CREATE TABLE statements from output | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--json |
Output results as JSON | — |
Shard Options
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file or directory | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite |
auto-detect |
--tenant-column |
Column name for tenant identification | auto-detect |
--tenant-value |
Single tenant value to extract | — |
--tenant-values |
Multiple tenant values (comma-separated) | — |
--root-tables |
Explicit root tables with tenant column | — |
--include-global |
Global table handling: none, lookups, all |
lookups |
-c, --config |
YAML config file for table classification | — |
--max-selected-rows |
Maximum rows to select (0 = no limit) | — |
--no-limit |
Disable row limit | — |
--strict-fk |
Fail if any FK integrity issues detected | — |
--no-schema |
Exclude CREATE TABLE statements from output | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--json |
Output results as JSON | — |
Performance
See BENCHMARKS.md for detailed comparisons.
Testing
# Unit tests
# Verify against real-world SQL dumps (MySQL, PostgreSQL, WordPress, etc.)
AI Agent Integration
sql-splitter includes documentation optimized for AI agents:
- llms.txt - LLM-friendly documentation following the llmstxt.org specification
- Agent Skill - Claude Code / Amp skill for automatic tool discovery
Install the skill in Claude Code / Amp:
License
MIT — see LICENSE.md