sql-splitter 1.8.0

High-performance CLI tool for splitting large SQL dump files into individual table files
Documentation
# Competitive Analysis

**Date**: 2025-12-21  
**Purpose**: Reference for understanding the SQL dump processing ecosystem

## Executive Summary

sql-splitter occupies a **unique position** in the market by combining multiple capabilities that currently require separate tools. As of v1.7.0, we offer: **split + merge + sample with FK preservation + tenant sharding + dialect conversion**. Planned features include: redaction, query, and diff.

No existing tool offers this combination in a single, streaming, CLI-first, multi-dialect tool.

---

## Current sql-splitter Feature Status (v1.7.0)

| Feature | Status | Version Added |
|---------|--------|---------------|
| Split per-table | ✅ Implemented | v1.0.0 |
| Analyze dumps | ✅ Implemented | v1.0.0 |
| Multi-dialect (MySQL, PostgreSQL, SQLite) | ✅ Implemented | v1.1.0 |
| Auto-detect dialect | ✅ Implemented | v1.2.0 |
| Compressed files (gzip, bz2, xz, zstd) | ✅ Implemented | v1.3.0 |
| Schema-only / Data-only filtering | ✅ Implemented | v1.3.0 |
| Shell completions | ✅ Implemented | v1.3.0 |
| Merge files | ✅ Implemented | v1.4.0 |
| FK-aware sampling | ✅ Implemented | v1.5.0 |
| Tenant sharding | ✅ Implemented | v1.6.0 |
| Dialect conversion | ✅ Implemented | v1.7.0 |
| Redaction/anonymization | 🟡 Planned ||
| Query/Filter (WHERE-style) | 🟡 Planned ||
| Diff dumps | 🟡 Planned ||
| MSSQL support | 🟡 Planned ||

---

## Key Competitors by Feature

### Split/Merge

| Tool | Language | Stars | Split | Merge | Streaming | Multi-dialect | Notes |
|------|----------|-------|-------|-------|-----------|---------------|-------|
| **sql-splitter** | Rust |||||| High-performance, 3 dialects |
| **mydumper** | C | 3k ||||| MySQL only, parallel dump/restore |
| **mysqldumpsplitter** | Shell | 500+ ||||| Basic regex extraction |
| **pgloader** | Common Lisp | 5k+ ||||| Loader only, not splitter |
| **Dumpling** | Go | 282 ||||| Archived, MySQL/TiDB only |
| **SQLSplit** | C++ | 4 ||||| Simple regex-based |

**[mydumper](https://github.com/mydumper/mydumper)** is notable:
- ✅ Multi-threaded parallel operations
- ✅ Consistent snapshots
- ✅ Active development (3k stars)
- ✅ Basic masquerading (anonymization)
- ❌ MySQL/MariaDB only
- ❌ Requires database connection for dump

**Gap**: No other tool combines split/merge with streaming + multi-dialect support. sql-splitter is unique.

---

### Sample with FK Preservation

| Tool | Language | Stars | FK-Aware | Streaming | CLI-First | Notes |
|------|----------|-------|----------|-----------|-----------|-------|
| **sql-splitter** | Rust ||||| v1.5.0 |
| **Jailer** | Java | 3.1k |||| GUI-heavy, JDBC-based |
| **Condenser** | Python | 327 |||| Config-driven, FK cycle breaking |
| **subsetter** | Python | ~10 |||| Simple, pip installable |
| **DBSubsetter** | Scala | ~50 |||| Less maintained |

**[Jailer](https://github.com/Wisser/Jailer)** is the most comprehensive:
- ✅ Excellent FK-preserving subsetting
- ✅ Topological sort output
- ✅ 12+ database support (via JDBC)
- ✅ Multiple export formats (SQL, JSON, XML, DbUnit)
- ❌ Requires database connection (JDBC)
- ❌ GUI-focused, not CLI-first
- ❌ No streaming for large dumps
- ❌ No anonymization

**[Condenser](https://github.com/TonicAI/condenser)** (by Tonic.ai):
- ✅ Simple YAML config
- ✅ FK cycle detection and breaking
- ✅ Passthrough tables support
- ✅ Implicit FK support
- ❌ PostgreSQL/MySQL only
- ❌ Limited to ~10GB databases
- ❌ Requires database connection

**Gap**: sql-splitter is the only streaming, CLI-first, FK-aware sampler that works on dump files directly.

---

### Tenant/Shard Extraction

| Tool | Notes |
|------|-------|
| **sql-splitter** | ✅ v1.6.0: FK chain resolution, auto tenant column detection |
| Jailer | Limited: can filter by starting entity |
| Condenser | Limited: via starting point constraints |
| DuckDB | Via manual SQL queries only |

**Gap**: sql-splitter is unique in offering dedicated multi-tenant extraction with automatic FK chain following directly on dump files.

---

### Redaction/Anonymization

| Tool | Language | Stars | MySQL | PostgreSQL | SQLite | Streaming | Notes |
|------|----------|-------|-------|------------|--------|-----------|-------|
| **sql-splitter** | Rust || 🟡 | 🟡 | 🟡 || Planned |
| **nxs-data-anonymizer** | Go | 271 ||||| Go templates + Sprig |
| **pynonymizer** | Python | 109 ||||| Faker integration, GDPR focus |
| **myanon** | C | ~30 ||||| stdin/stdout streaming |
| **pganonymize** | Python |||||| YAML config |
| **pg-anonymizer** | TypeScript | 236 ||||| |
| **go-anonymize-mysqldump** | Go | 60 ||||| |
| **dumpctl** | Go | ~5 ||||| Early stage |

**[pynonymizer](https://github.com/rwnx/pynonymizer)** is notable:
- ✅ Faker integration for realistic data
- ✅ GDPR compliance focus
- ✅ Compressed I/O
- ✅ MSSQL support
- ❌ Requires temp database (not pure streaming)
- ❌ No SQLite

**[myanon](https://github.com/ppomes/myanon)** is notable:
- ✅ True stdin/stdout streaming
- ✅ HMAC-SHA256 for consistent hashing
- ✅ Python/Faker rules
- ❌ MySQL-only

**Gap**: No SQLite anonymization tool exists. No combined sample+anonymize workflow.

---

### Dialect Conversion

| Tool | Language | Stars | Dialects | COPY↔INSERT | Streaming |
|------|----------|-------|----------|-------------|-----------|
| **sql-splitter** | Rust || 3 (✅) |||
| **sqlglot** | Python | 7k+ | 31 |||
| **pgloader** | Common Lisp | 5k+ | → PG only |||
| **mysql2postgres** | Ruby | 300 | MySQL→PG | Partial ||
| **node-sql-parser** | JavaScript | 800 | 12 |||
| **jOOQ Translator** | Web || 25+ |||

**[sqlglot](https://github.com/tobymao/sqlglot)** is excellent for query transpilation:
- ✅ 31 dialect support
- ✅ AST manipulation and optimization
- ✅ Active development (7k+ stars)
- ❌ Not designed for full dump conversion
- ❌ Doesn't handle COPY blocks or session commands

**sql-splitter's convert advantages**:
- ✅ PostgreSQL COPY → INSERT with NULL/escape handling
- ✅ Session command stripping (SET, PRAGMA, etc.)
- ✅ 30+ data type mappings (AUTO_INCREMENT ↔ SERIAL, etc.)
- ✅ Streaming architecture
- ✅ Compressed input support

**Gap**: sql-splitter handles full dump conversion with COPY↔INSERT that no other tool does.

---

### Query/Filter Dumps

| Tool | Language | Stars | Notes |
|------|----------|-------|-------|
| **sql-splitter** | Rust || 🟡 Planned: WHERE-style filtering |
| **DuckDB** | C++ | 34.8k | Query SQL/CSV/JSON/Parquet directly |
| **sqlglot** | Python | 7k+ | Parse/transpile, not filter |

**[DuckDB](https://github.com/duckdb/duckdb)** could solve querying:
- ✅ Query SQL/CSV/JSON/Parquet directly
- ✅ Extremely powerful analytical engine
- ❌ Overkill for simple dump filtering
- ❌ No FK-aware subsetting
- ❌ Loads data into memory

---

### MSSQL Support

| Tool | MSSQL |
|------|-------|
| **sql-splitter** | 🟡 Planned |
| Jailer | ✅ (via JDBC) |
| pynonymizer ||
| sqlglot | ✅ (parsing only) |
| pgloader ||
| nxs-data-anonymizer ||

**Gap**: Major gap in ecosystem for MSSQL dump processing CLI tools.

---

## Comparison Matrix

| Feature | sql-splitter | mydumper | pgloader | Jailer | Condenser | nxs-anonymizer | sqlglot | DuckDB |
|---------|-------------|----------|----------|--------|-----------|----------------|---------|--------|
| Split per-table |||||||||
| Merge files |||||||||
| Sample + FK |||||||||
| Tenant sharding |||| Limited | Limited ||| Via SQL |
| Redaction | 🟡 | Basic |||||||
| Query/Filter | 🟡 ||| Limited |||||
| Diff | 🟡 ||| Limited |||| Via SQL |
| Convert dialects ||| → PG | Limited |||||
| MySQL |||||||||
| PostgreSQL |||||||||
| SQLite |||||||||
| MSSQL | 🟡 ||||||||
| Streaming |||||||||
| CLI-first |||||||||
| Works on dumps |||||||||
| Compression |||||||||

---

## sql-splitter's Unique Value Proposition

1. **Unified tool** — Split + merge + sample + shard + convert in one binary
2. **Works on dump files** — No database connection required (unlike Jailer, Condenser, mydumper)
3. **Streaming architecture** — Handle 10GB+ dumps without memory issues
4. **CLI-first** — DevOps/automation friendly, pipe-compatible
5. **Multi-dialect** — MySQL, PostgreSQL, SQLite in one tool
6. **FK-aware operations** — Sample and shard preserve referential integrity
7. **Rust performance** — 600+ MB/s, faster than Python/Java alternatives
8. **Compression support** — gzip, bz2, xz, zstd auto-detected
9. **Composable** — Split → Sample → Convert → Merge pipeline

---

## Potential Integrations

Consider these as complementary tools or inspiration:

| Tool | Use Case |
|------|----------|
| **sqlglot** | Reference for dialect conversion grammar |
| **DuckDB** | Alternative for complex ad-hoc queries |
| **Jailer** | Reference for FK subsetting algorithms |
| **Condenser** | Reference for cycle detection in FK graphs |
| **nxs-data-anonymizer** | Reference for Go template-based redaction |
| **pynonymizer** | Reference for Faker-based anonymization |
| **pgloader** | Reference for high-performance data loading |
| **mydumper** | Reference for parallel dump operations |

---

## Recommendations

1. **Prioritize redaction** — Next major differentiator; combine with sample for powerful dev data workflow
2. **Don't over-invest in query** — DuckDB exists for complex needs; focus on simple WHERE filtering
3. **Market the combination** — "One tool for split + sample + anonymize + convert"
4. **Target DevOps** — CLI + streaming + pipes is the right approach
5. **Consider MSSQL** — Major gap in ecosystem for dump processing
6. **Highlight "works on dumps"** — Key differentiator vs Jailer/Condenser which require DB connections

---

## Related

- [Roadmap]ROADMAP.md
- [Changelog]../CHANGELOG.md

### Competitor Links

**Split/Merge:**
- [mydumper]https://github.com/mydumper/mydumper
- [mysqldumpsplitter]https://github.com/kedarvj/mysqldumpsplitter
- [Dumpling]https://github.com/pingcap/dumpling (archived)

**FK-Aware Sampling:**
- [Jailer]https://github.com/Wisser/Jailer
- [Condenser]https://github.com/TonicAI/condenser
- [subsetter]https://github.com/msg555/subsetter
- [DBSubsetter]https://github.com/bluerogue251/DBSubsetter

**Anonymization:**
- [nxs-data-anonymizer]https://github.com/nixys/nxs-data-anonymizer
- [pynonymizer]https://github.com/rwnx/pynonymizer
- [myanon]https://github.com/ppomes/myanon
- [pganonymize]https://pypi.org/project/pganonymize/

**Dialect Conversion:**
- [sqlglot]https://github.com/tobymao/sqlglot
- [pgloader]https://github.com/dimitri/pgloader
- [mysql2postgres]https://github.com/mysql2postgres/mysql2postgres
- [node-sql-parser]https://www.npmjs.com/package/node-sql-parser

**General:**
- [DuckDB]https://github.com/duckdb/duckdb