sql-to-csv
Fast, parallel SQL dump to CSV/TSV converter.
Extracts data from large SQL dump files and converts them to clean CSV or TSV files. One file per table, streamed with bounded memory, parallelized across all your CPU cores. Optionally generates PostgreSQL-compatible DDL for direct COPY import.
Why
SQL dumps are everywhere — MySQL exports, database backups, Wikimedia data releases — but they're a terrible format for actually working with data. Loading a multi-GB dump by replaying INSERT statements takes hours. Most tools that claim to help are single-threaded, eat all your RAM, or require a running database server as an intermediary.
This tool was born out of trying to import Wikimedia's 6GB MySQL database dumps. There was no fast, straightforward way to get the data into CSV so it could be loaded anywhere — PostgreSQL, a data pipeline, pandas, DuckDB, or anything else that reads CSV.
sql-to-csv takes a different approach: it streams the dump, uses all your CPU cores via parallel workers, and outputs clean CSV/TSV files. What used to take hours now takes seconds. The CSV files work with anything. If you're targeting PostgreSQL specifically, the tool also generates schema.sql and a load.sql script for one-command COPY import.
Supported dialects
| Dialect | Identifiers | Data format | Auto-detected by |
|---|---|---|---|
| MySQL / MariaDB | `backticks` |
Batched INSERT VALUES | /*!, ENGINE=, backtick identifiers |
| PostgreSQL | "double quotes" |
COPY FROM stdin | SET client_encoding, COPY ... FROM stdin |
| SQL Server | [square brackets] |
INSERT with N'...' strings | [dbo], SET NOCOUNT, NVARCHAR |
| Oracle | "DOUBLE QUOTES" |
INSERT VALUES | VARCHAR2, NUMBER(, SPOOL |
| SQLite | bare or "quoted" |
INSERT VALUES | BEGIN TRANSACTION, AUTOINCREMENT |
Installation
Homebrew (macOS)
Pre-built binaries
Download from GitHub Releases:
# macOS (Apple Silicon)
Cargo
Build from source
# Binary at target/release/sql-to-csv
Quick start
# Convert a MySQL dump to CSV (dialect auto-detected)
# Convert a gzipped dump
# SQL Server dump with explicit dialect
# Output TSV instead of CSV
# Extract only specific tables
# Just the data, no DDL
# Dry run (parse and count, no output)
Output
Given an input dump, the tool produces:
output/
data/
users.csv # One CSV per table (or .tsv with --tsv)
posts.csv
comments.csv
schema.sql # PostgreSQL-compatible CREATE TABLE statements
load.sql # PostgreSQL helper: \i schema.sql + \COPY commands
manifest.json # Table names, row counts, file paths
errors.log # Warnings and errors with byte offsets
The CSV/TSV files work with any tool that reads tabular data — PostgreSQL, DuckDB, pandas, Excel, R, data pipelines, etc.
The tool refuses to write to a non-empty output directory. Use --force (-f) to overwrite.
Loading into PostgreSQL
# One command — applies schema and loads all tables
Using with other tools
# DuckDB
# pandas
CLI reference
sql-to-csv [OPTIONS] <INPUT> <OUT_DIR>
Arguments:
<INPUT> Input SQL dump file (.sql or .sql.gz)
<OUT_DIR> Output directory
Options:
--dialect <DIALECT> Source SQL dialect: mysql, pg, mssql, oracle, sqlite
(auto-detected if omitted)
-f, --force Overwrite output directory if it already exists
--schema-only Only emit DDL (schema.sql), no data files
--data-only Only emit data files, skip DDL conversion
--tables <a,b,c> Comma-separated table allowlist
--workers <N> Parallel worker count [default: num_cpus]
--shard-rows <N> Rotate data file every N rows per table
--no-postgres-ddl Disable DDL conversion (skip schema.sql generation)
--dry-run Parse and count rows only
--tsv Output TSV (tab-separated) instead of CSV
--null-marker <STR> NULL marker string [default: \N]
--delimiter <CHAR> CSV delimiter, ignored if --tsv [default: ,]
-h, --help Print help
-V, --version Print version
DDL conversion (PostgreSQL)
When DDL conversion is enabled (the default), the tool converts source DDL to PostgreSQL. Key type mappings:
| MySQL | SQL Server | Oracle | SQLite | PostgreSQL |
|---|---|---|---|---|
INT AUTO_INCREMENT |
INT IDENTITY |
NUMBER(9) |
INTEGER AUTOINCREMENT |
INTEGER GENERATED ALWAYS AS IDENTITY |
TINYINT(1) |
BIT |
- | - | BOOLEAN |
VARCHAR(255) |
NVARCHAR(255) |
VARCHAR2(255) |
TEXT |
VARCHAR(255) / TEXT |
TEXT / LONGTEXT |
NVARCHAR(MAX) |
CLOB |
TEXT |
TEXT |
BLOB |
VARBINARY(MAX) |
RAW |
BLOB |
BYTEA |
DATETIME |
DATETIME2 |
DATE |
- | TIMESTAMP |
JSON |
XML |
XMLTYPE |
- | JSONB / XML |
| - | UNIQUEIDENTIFIER |
- | - | UUID |
| - | MONEY |
- | - | NUMERIC(19,4) |
ENUM(...) |
- | - | - | TEXT (with warning) |
FLOAT / DOUBLE |
FLOAT |
NUMBER |
REAL |
REAL / DOUBLE PRECISION |
Unsupported features (spatial types, triggers, etc.) emit warnings in errors.log. Use --data-only or --no-postgres-ddl to skip DDL conversion entirely if you only need the CSV files.
Architecture
- Phase 1 (sequential): Lexer-based scan builds an index of statement boundaries, respecting SQL string quoting rules. No naive regex splitting.
- Phase 2 (parallel): Rayon worker pool processes disjoint index ranges via
mmap. Per-table writers behindArc<Mutex<...>>. Bounded memory — no loading entire files into RAM.
Man page and shell completions
# Generate
# Install man page (macOS)
# Install fish completions
License
MIT