wikiparse-rs
wikiparse-rs is a blazingly fast CLI and library written in Rust for streaming parsed uncompressed MediaWiki/Wikipedia SQL dumps.
It reads INSERT rows from supported Wikipedia tables and exports them as CSV or JSON.
Install
Install as a CLI tool from crates.io:
Install as a CLI tool from GitHub:
Install as a CLI tool from a local checkout (from this repository root):
Use as a library in another Rust project:
Import it in Rust as wikiparse_rs.
Quick usage
Export a table dump to CSV:
Export from stdin (default when --input is omitted):
|
Export a table dump to JSON:
Example Usage
Iterate over typed page rows and destructure fields inline:
use File;
use ;
use ;
CLI command
The wikiparse-rs binary is designed for scriptable dump export.
--table: which supported MediaWiki table to parse (for examplepage,pagelinks,linktarget)--format: output format,csvorjson--input: path to the SQL dump file, or-for stdin (defaults to stdin when omitted)--limit: optional row limit for quick sampling
This makes the command useful as a standalone binary to transform large SQL dumps into CSV/JSON for downstream tools.
Compressed dumps can be streamed without extracting first:
|
Show progress while streaming a compressed dump with pv:
| |
Column selection with xsv
After exporting CSV, you can select only the columns you need with xsv:
|