qsv 0.18.1

A high performance CSV command line toolkit.
qsv-0.18.1 is not a library.

qsv: Ultra-fast, data-wrangling CLI toolkit for CSVs

Ubuntu build status Windows build status macOS build status Security audit Crates.io Discussions Docs
qsv is a command line program for indexing, slicing, analyzing, splitting, enriching, validating & joining CSV files. Commands are simple, fast and composable:

  1. Simple tasks are easy.
  2. Performance trade offs are exposed in the CLI interface.
  3. Composition does not come at the expense of performance.

NOTE: qsv is a fork of the popular xsv utility, merging several pending PRs since xsv 0.13.0's release, along with additional features & commands for data-wrangling. See FAQ for more details. (NEW and EXTENDED commands are marked accordingly).

Available commands

Command Description
apply Apply series of string, date, currency & geocoding transformations to a CSV column. (NEW)
behead Drop headers from a CSV. (NEW)
cat Concatenate CSV files by row or by column.
count[^1] Count the rows in a CSV file. (Instantaneous with an index.)
dedup[^2] Remove redundant rows. (NEW)
enum Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value. (NEW)
exclude[^1] Removes a set of CSV data from another set based on the specified columns. (NEW)
explode Explode rows into multiple ones by splitting a column value based on the given separator. (NEW)
fill Fill empty values. (NEW)
fixlengths Force a CSV to have same-length records by either padding or truncating them.
flatten A flattened view of CSV records. Useful for viewing one record at a time.e.g. qsv slice -i 5 data.csv | qsv flatten.
fmt Reformat a CSV with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.) (EXTENDED)
foreach Loop over a CSV to execute bash commands. (*nix only) (NEW)
frequency[^1][^3] Build frequency tables of each column. (Uses parallelism to go faster if an index is present.)
headers Show the headers of a CSV. Or show the intersection of all headers between many CSV files.
index Create an index for a CSV. This is very quick & provides constant time indexing into the CSV file.
input Read a CSV with exotic quoting/escaping rules.
join[^1] Inner, outer, cross, anti & semi joins. Uses a simple hash index to make it fast. (EXTENDED)
jsonl Convert newline-delimited JSON to CSV. (NEW)
lua Execute a Lua script over CSV lines to transform, aggregate or filter them. (NEW)
partition Partition a CSV based on a column value.
pseudo Pseudonymise the value of the given column by replacing them with an incremental identifier. (NEW)
rename Rename the columns of a CSV efficiently. (NEW)
replace Replace CSV data using a regex. (NEW)
reverse[^2] Reverse order of rows in a CSV. (NEW)
sample[^1] Randomly draw rows from a CSV using reservoir sampling (i.e., use memory proportional to the size of the sample). (EXTENDED)
search Run a regex over a CSV. Applies the regex to each field individually & shows only matching rows. (EXTENDED)
searchset Run multiple regexes over a CSV in a single pass. Applies the regexes to each field individually & shows only matching rows. (NEW)
select[^1] Select or re-order columns. (EXTENDED)
slice[^1][^2] Slice rows from any part of a CSV. When an index is present, this only has to parse the rows in the slice (instead of all rows leading up to the start of the slice).
sort Sort CSV data. (EXTENDED)
split[^1][^3] Split one CSV file into many CSV files of N chunks.
stats[^1][^2][^3] Show basic types & statistics of each column in a CSV. (i.e., sum, min/max, min/max length, mean, stddev, variance, quartiles, IQR, lower/upper fences, skew, median, mode, cardinality & nullcount) (EXTENDED)
table[^2] Show aligned output of a CSV using elastic tabstops. (EXTENDED)
transpose[^2] Transpose rows/columns of a CSV. (NEW)

[^1]: uses an index when available. join always uses indices.
[^2]: loads the entire CSV into memory. Note that stats & transpose have modes that do not load the entire CSV into memory.
[^3]: runs parallel jobs by default (use --jobs option to adjust)

Installation

Binaries for Windows, Linux and macOS are available from Github.

Alternatively, you can compile from source by installing Cargo (Rust's package manager) and installing qsv using Cargo:

cargo install qsv

Compiling from this repository also works similarly:

git clone git://github.com/jqnatividad/qsv
cd qsv
cargo build --release

The compiled binary will end up in ./target/release/qsv.

Performance

If you want more performance, set this environment variable BEFORE installing/compiling:

On Linux and macOS:

export CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'

On Windows Powershell:

$env:CARGO_BUILD_RUSTFLAGS='-C target-cpu=native'

Do note though that the resulting binary will only run on machines with the same architecture as the machine you installed/compiled from.
To find out your CPU architecture and other valid values for target-cpu:

rustc --print target-cpus

You can also get more performance by using the performance-oriented mimalloc memory allocator. To do so, install/compile qsv with the mimalloc feature.

cargo install qsv --features=mimalloc

or

cargo build --features=mimalloc

Benchmarks

Some very rough benchmarks of various qsv commands.

License

Dual-licensed under MIT or the UNLICENSE.

Sponsor

qsv was made possible by datHere - Data Infrastructure Engineering.
Standards-based, best-of-breed, open source solutions to make your Data Useful, Usable & Used.

Naming Collision

This project is unrelated to Intel's Quick Sync Video.