Skip to main content

Crate remnant

Crate remnant 

Source
Expand description

Random sub-sampling of CSV files and PostgreSQL databases.

remnant reads data from a CSV file or a PostgreSQL database, samples a percentage of rows, and writes the result as CSV or Parquet. When sampling a PostgreSQL database it also generates a rebuild.sql script that recreates the schema and loads the sampled data.

§Library usage

§CSV sampling

remnant::csv::run("input.csv", "output.csv", 10.0, Some(10_000), Some(42)).unwrap();

§PostgreSQL sampling

use remnant::pg::OutputFormat;
remnant::pg::run(
    "postgres://user:pass@localhost/mydb",
    "./output",
    10.0,
    Some(42),
    OutputFormat::Csv,
    None, // sample all non-system schemas
).await.unwrap();

§Modules

  • csv — Sample rows from a CSV file and write a CSV output.
  • pg — Sample every table in a PostgreSQL database, write CSV or Parquet files, and generate a SQL rebuild script.
  • sampling — Core sampling logic operating on Polars DataFrames.

§CLI

remnant is also a command-line tool with csv and pg subcommands. See the README for CLI usage.

Modules§

csv
CSV file sampling.
pg
PostgreSQL database sampling and schema introspection.
sampling
Core sampling logic for DataFrames.