Skip to main content

remnant/
lib.rs

1//! Random sub-sampling of CSV files and PostgreSQL databases.
2//!
3//! `remnant` reads data from a CSV file or a PostgreSQL database, samples a
4//! percentage of rows, and writes the result as CSV or Parquet. When sampling
5//! a PostgreSQL database it also generates a `rebuild.sql` script that
6//! recreates the schema and loads the sampled data.
7//!
8//! # Library usage
9//!
10//! ## CSV sampling
11//!
12//! ```no_run
13//! remnant::csv::run("input.csv", "output.csv", 10.0, Some(10_000), Some(42)).unwrap();
14//! ```
15//!
16//! ## PostgreSQL sampling
17//!
18//! ```no_run
19//! # tokio::runtime::Runtime::new().unwrap().block_on(async {
20//! use remnant::pg::OutputFormat;
21//! remnant::pg::run(
22//!     "postgres://user:pass@localhost/mydb",
23//!     "./output",
24//!     10.0,
25//!     Some(42),
26//!     OutputFormat::Csv,
27//!     None, // sample all non-system schemas
28//! ).await.unwrap();
29//! # });
30//! ```
31//!
32//! # Modules
33//!
34//! - [`csv`] — Sample rows from a CSV file and write a CSV output.
35//! - [`pg`] — Sample every table in a PostgreSQL database, write CSV or Parquet
36//!   files, and generate a SQL rebuild script.
37//! - [`sampling`] — Core sampling logic operating on Polars DataFrames.
38//!
39//! # CLI
40//!
41//! `remnant` is also a command-line tool with `csv` and `pg` subcommands.
42//! See the [README](https://github.com/ecolumix/csv-sampler) for CLI usage.
43
44pub mod csv;
45pub mod pg;
46pub mod sampling;