sql-splitter 1.8.0

# sql-splitter

> High-performance CLI tool for splitting large SQL dump files into individual table files. Written in Rust with 600+ MB/s throughput, streaming architecture that handles files larger than RAM, and multi-dialect support for MySQL, PostgreSQL, and SQLite.

sql-splitter reads SQL dump files and routes statements to separate output files based on table name. It handles compressed files (gzip, bzip2, xz, zstd) with automatic detection.

## Installation

```bash
# From crates.io (recommended)
cargo install sql-splitter

# From source
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
cargo build --release

# Download pre-built binaries
# https://github.com/helgesverre/sql-splitter/releases
```

## Quick Start

```bash
# Split a MySQL/MariaDB dump (default dialect)
sql-splitter split dump.sql -o tables/

# Split a PostgreSQL pg_dump file
sql-splitter split pg_dump.sql -o tables/ --dialect=postgres

# Split a SQLite dump
sql-splitter split sqlite.sql -o tables/ --dialect=sqlite

# Compressed files are auto-detected
sql-splitter split backup.sql.gz -o tables/
sql-splitter split backup.sql.zst -o tables/
```

## Commands

### split

Split a SQL dump file into individual table files.

```bash
sql-splitter split database.sql --output=tables
sql-splitter split database.sql --dialect=postgres    # PostgreSQL mode
sql-splitter split database.sql --tables=users,posts  # specific tables only
sql-splitter split database.sql --schema-only         # DDL only
sql-splitter split database.sql --data-only           # INSERT/COPY only
sql-splitter split database.sql --dry-run             # preview without writing
sql-splitter split database.sql --progress            # show progress bar
```

Options:
- `-o, --output <DIR>`: Output directory (default: "output")
- `-d, --dialect <DIALECT>`: SQL dialect: mysql, postgres, sqlite (auto-detected)
- `-t, --tables <TABLES>`: Filter to specific tables (comma-separated)
- `--schema-only`: Only DDL statements (CREATE, ALTER, DROP)
- `--data-only`: Only DML statements (INSERT, COPY)
- `--dry-run`: Preview what would be split without writing files
- `-p, --progress`: Show progress bar during processing

### merge

Merge split SQL files back into a single dump file.

```bash
sql-splitter merge tables/ -o restored.sql           # merge all tables
sql-splitter merge tables/ -o partial.sql --tables users,orders  # specific tables
sql-splitter merge tables/ -o restored.sql --exclude cache,sessions
sql-splitter merge tables/ -o restored.sql --transaction  # wrap in transaction
sql-splitter merge tables/ -o restored.sql --dialect postgres
```

Options:
- `-o, --output <FILE>`: Output SQL file (default: stdout)
- `-d, --dialect <DIALECT>`: SQL dialect for headers/footers (default: mysql)
- `-t, --tables <TABLES>`: Only merge these tables (comma-separated)
- `-e, --exclude <TABLES>`: Exclude these tables (comma-separated)
- `--transaction`: Wrap output in BEGIN/COMMIT transaction
- `--no-header`: Skip header comments
- `-p, --progress`: Show progress bar during merging
- `--dry-run`: Preview what would be merged without writing

### analyze

Analyze a SQL file and display table statistics without splitting.

```bash
sql-splitter analyze database.sql
sql-splitter analyze database.sql --progress
```

### convert

Convert SQL dumps between dialects (MySQL, PostgreSQL, SQLite).

```bash
sql-splitter convert mysql.sql -o postgres.sql --to postgres
sql-splitter convert pg_dump.sql -o mysql.sql --to mysql
sql-splitter convert dump.sql -o sqlite.sql --to sqlite
sql-splitter convert dump.sql --from postgres --to mysql -o output.sql
```

Options:
- `-o, --output <FILE>`: Output SQL file (default: stdout)
- `--from <DIALECT>`: Source dialect (auto-detected if omitted)
- `--to <DIALECT>`: Target dialect (required)
- `--strict`: Fail on any unsupported feature
- `-p, --progress`: Show progress bar
- `--dry-run`: Preview without writing files

Supported conversions (all 6 pairs):
- MySQL ↔ PostgreSQL (including COPY → INSERT conversion)
- MySQL ↔ SQLite
- PostgreSQL ↔ SQLite

Features:
- 30+ data type mappings (AUTO_INCREMENT ↔ SERIAL, DATETIME ↔ TIMESTAMP, etc.)
- PostgreSQL COPY → INSERT with NULL handling and escape sequences
- Schema prefix stripping (public.table → table)
- Type cast removal (::regclass, ::text)
- nextval() and DEFAULT now() conversion
- Session command stripping for all dialects
- Warnings for unsupported features (ENUM, arrays, triggers)

### validate

Validate SQL dump integrity with comprehensive checks.

```bash
sql-splitter validate dump.sql                    # validate MySQL dump
sql-splitter validate dump.sql --dialect=postgres # specify dialect
sql-splitter validate dump.sql --strict           # treat warnings as errors
sql-splitter validate dump.sql --json             # JSON output for CI
sql-splitter validate dump.sql --no-fk-checks     # skip heavy data checks
```

Options:
- `-d, --dialect <DIALECT>`: SQL dialect (auto-detected if omitted)
- `--strict`: Treat warnings as errors (exit 1 on any warning)
- `--json`: Output results as JSON for programmatic parsing
- `--max-rows-per-table <N>`: Max rows per table for PK/FK checks (default: 1,000,000)
- `--no-fk-checks`: Skip PK/FK data integrity checks
- `-p, --progress`: Show progress bar

Validation checks:
- SQL syntax validation (parser error detection)
- DDL/DML consistency (INSERTs reference existing tables)
- Encoding validation (UTF-8)
- Duplicate primary key detection (MySQL only)
- FK referential integrity (MySQL only)

Note: PK/FK data checks are MySQL-only; PostgreSQL and SQLite emit an info message.

### completions

Generate shell completions for bash, zsh, or fish.

```bash
sql-splitter completions bash >> ~/.bashrc
sql-splitter completions zsh >> ~/.zshrc
sql-splitter completions fish >> ~/.config/fish/completions/sql-splitter.fish
```

## Supported SQL Dialects

| Dialect | Flag | Dump Tool | Features |
|---------|------|-----------|----------|
| MySQL/MariaDB | `--dialect=mysql` (default) | mysqldump | Backtick quoting, backslash escapes |
| PostgreSQL | `--dialect=postgres` | pg_dump | Double-quote identifiers, COPY FROM stdin, dollar-quoting |
| SQLite | `--dialect=sqlite` | sqlite3 .dump | Double-quote identifiers |

## Supported Statement Types

- CREATE TABLE / DROP TABLE
- INSERT INTO
- CREATE INDEX
- ALTER TABLE
- COPY (PostgreSQL)

Other statements (SELECT, UPDATE, DELETE) are skipped.

## Supported Compression Formats

Files are auto-detected by extension:
- `.gz` - gzip
- `.bz2` - bzip2
- `.xz` - xz/lzma
- `.zst` - zstd

## Performance

- **600+ MB/s** throughput on modern hardware
- **Constant ~50MB memory** regardless of file size
- **5x faster** than shell-based alternatives
- Streaming architecture handles files larger than RAM

## Documentation

- [README](https://github.com/helgesverre/sql-splitter/blob/main/README.md): Full documentation
- [BENCHMARKS.md](https://github.com/helgesverre/sql-splitter/blob/main/BENCHMARKS.md): Performance benchmarks
- [CHANGELOG.md](https://github.com/helgesverre/sql-splitter/blob/main/CHANGELOG.md): Version history

## Source Code

- [src/cmd/](https://github.com/helgesverre/sql-splitter/tree/main/src/cmd): CLI commands
- [src/parser/](https://github.com/helgesverre/sql-splitter/tree/main/src/parser): Streaming SQL parser
- [src/writer/](https://github.com/helgesverre/sql-splitter/tree/main/src/writer): Buffered file writers
- [src/splitter/](https://github.com/helgesverre/sql-splitter/tree/main/src/splitter): Split orchestration
- [src/merger/](https://github.com/helgesverre/sql-splitter/tree/main/src/merger): Merge orchestration
- [src/analyzer/](https://github.com/helgesverre/sql-splitter/tree/main/src/analyzer): Statistical analysis
- [src/convert/](https://github.com/helgesverre/sql-splitter/tree/main/src/convert): Dialect conversion
- [src/validate/](https://github.com/helgesverre/sql-splitter/tree/main/src/validate): Dump validation
- [crates/test_data_gen/](https://github.com/helgesverre/sql-splitter/tree/main/crates/test_data_gen): Test fixture generator

## Optional

- [AGENTS.md](https://github.com/helgesverre/sql-splitter/blob/main/AGENTS.md): AI assistant guidance
- [LICENSE.md](https://github.com/helgesverre/sql-splitter/blob/main/LICENSE.md): MIT License