dply is a command line tool for viewing, querying, and writing csv and parquet files, inspired by dplyr and powered by polars.
Usage overview
A dply pipeline consists of a number of functions to read, transform, or write data to disk.
The following pipeline reads a parquet file[^1], computes the minimum, mean, and
maximum fare for each payment type, saves the result to fares.csv CSV file, and
shows the result:
$ dply -c 'parquet("nyctaxi.parquet") |
group_by(payment_type) |
summarize(
min_price = min(total_amount),
mean_price = mean(total_amount),
max_price = max(total_amount)
) |
arrange(payment_type) |
csv("fares.csv") |
show()'
shape: (5, 4)
┌──────────────┬───────────┬────────────┬───────────┐
│ payment_type ┆ min_price ┆ mean_price ┆ max_price │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞══════════════╪═══════════╪════════════╪═══════════╡
│ Cash ┆ -61.85 ┆ 18.07 ┆ 86.55 │
│ Credit card ┆ 4.56 ┆ 22.969491 ┆ 324.72 │
│ Dispute ┆ -55.6 ┆ -0.145161 ┆ 54.05 │
│ No charge ┆ -16.3 ┆ 0.086667 ┆ 19.8 │
│ Unknown ┆ 9.96 ┆ 28.893333 ┆ 85.02 │
└──────────────┴───────────┴────────────┴───────────┘
[^1]: The file nyctaxi.parquet in the tests/data folder is a
250 rows parquet file sampled from the NYC trip record data.
Supported functions
dply supports the following functions:
- arrange Sorts rows by column values
- count Counts columns unique values
- csv Reads or writes a dataframe in CSV format
- distinct Retains unique rows
- filter Filters rows that satisfy given predicates
- glimpse Shows a dataframe overview
- group by and summarize Performs grouped aggregations
- head Shows the first few dataframe rows in table format
- joins Left, inner, outer and cross joins
- mutate Creates or mutate columns
- parquet Reads or writes a dataframe in Parquet format
- relocate Moves columns positions
- rename Renames columns
- select Selects columns
- show Shows all dataframe rows
- unnest Expands list columns into rows
more examples can be found in the tests folder.
Installation
Binaries generated by the release Github action for Linux, macOS (x86), and Windows are available in the releases page.
You can also install dply using Cargo:
or by building it from this repository: