
[](https://crates.io/crates/dfkit)

## dfkit
dfkit is an extensive suite of command-line functions to easily view, query, and manipulate CSV, Parquet, JSON, and Avro files. Written in Rust and powered by [Apache Arrow](https://github.com/apache/arrow) and [Apache DataFusion](https://github.com/apache/datafusion). Currently a work in progress.
## Highlights
Here's a high level overview of some of the features in dfkit:
- Supports viewing and manipulating both local files and and files from remote URLs
- Works with CSV, JSON, Parquet, and Avro files
- Ultra-fast performance powered by Apache Arrow and DataFusion
- Transform data with SQL or with several other built-in functions
- Written entirely in Rust!
## Commands
```
dfkit 0.2.0
USAGE:
dfkit <SUBCOMMAND>
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
cat Concatenate multiple files or all files in a directory
convert Convert file format (CSV, Parquet, JSON)
count Count the number of rows in a file
dedup Remove duplicate rows
describe Show summary statistics for a file
help Prints this message or the help of the given subcommand(s)
query Run a SQL query on a file
reverse Reverse the order of rows
schema Show schema of a file
sort Sort rows by one or more columns
split Split a file into N chunks
view View the contents of a file
```
## Installation
dfkit can be installed via cargo (requires rust):
```
cargo install dfkit
```
## Examples
View takes the filename and an optional limit argument.
```
dfkit view sample.csv
+-------+-----+
| Joe | 34 |
| Matt | 24 |
| Emily | 65 |
+-------+-----+
```
Query allows you to query the data with SQL. An optional output argument can also be supplied to save the results.
```
dfkit query sample.csv --sql "SELECT * FROM t WHERE age < 50"
+------+-----+
| Joe | 34 |
| Matt | 24 |
+------+-----+
```
Show the file schema.
```
dfkit schema sample.csv
+-------------+-----------+-------------+
| name | Utf8 | YES |
| age | Int64 | YES |
+-------------+-----------+-------------+
```
Show summary statistics of a file with `describe`.
```
dfkit describe sample.csv
+------------+-------+-------------------+
| count | 3 | 3.0 |
| null_count | 0 | 0.0 |
| mean | null | 41.0 |
| std | null | 21.37755832643195 |
| min | Emily | 24.0 |
| max | Matt | 65.0 |
| median | null | 34.0 |
+------------+-------+-------------------+
```
Reverse the order of rows (save the output with --output)
```
dfkit reverse sample.csv
+-------+-----+
| Emily | 65 |
| Matt | 24 |
| Joe | 34 |
+-------+-----+
```
Sort rows and optionally save the output with --output. You can specify multiple columns as
a comma separated string.
```
dfkit sort sample.csv --columns "age"
+-------+-----+
| Matt | 24 |
| Joe | 34 |
| Emily | 65 |
+-------+-----+
```