# tva: Tab-separated Values Assistant
Fast, reliable TSV processing toolkit in Rust.
[](https://github.com/wang-q/tva/actions)
[](https://app.codecov.io/gh/wang-q/tva/tree/master)
[](https://crates.io/crates/tva)
[](https://github.com/wang-q/tva)
[](https://wang-q.github.io/tva/)
## Overview
`tva` (pronounced "Tee-Va") is a high-performance command-line toolkit written in **Rust** for
processing tabular data. It brings the safety and speed of modern systems programming to the classic
Unix philosophy.
**Inspiration**
* [eBay's tsv-utils](https://github.com/eBay/tsv-utils) (discontinued): The primary reference for
functionality and performance.
* [GNU Datamash](https://www.gnu.org/software/datamash/): Statistical operations.
* [R's tidyverse](https://tidyr.tidyverse.org/): Reshaping concepts and string manipulation.
* [xan](https://github.com/medialab/xan): DSL and terminal-based plotting.
**Use Cases**
* **"Middle Data"**: Files too large for Excel/Pandas but too small for distributed systems (
Spark/Hadoop).
* **Data Pipelines**: Robust CLI-based ETL steps compatible with `awk`, `sort`, etc.
* **Exploration**: Fast summary statistics, sampling, and filtering on raw data.
**Design Principles**
* **Single Binary**: A standalone executable with no dependencies, easy to deploy.
* **Header Aware**: Manipulate columns by name or index.
* **Fail-fast**: Strict validation ensures data integrity (no silent truncation).
* **Streaming**: Stateless processing designed for infinite streams and large files.
* **TSV-first**: Prioritizes the reliability and simplicity of tab-separated values.
* **Performance**: Single-pass execution with minimal memory overhead.
**[Read the documentation online](https://wang-q.github.io/tva/)**
## Installation
Current release: 0.3.2
```bash
# Clone the repository and install via cargo
cargo install --force --path .
```
Or install the pre-compiled binary via the cross-platform package
manager [cbp](https://github.com/wang-q/cbp) (supports older Linux systems with glibc 2.17+):
```bash
cbp install tva
```
You can also download the pre-compiled binaries from
the [Releases](https://github.com/wang-q/tva/releases) page.
## Running Examples
The examples in the documentation use sample data located in the `docs/data/` directory. To run
these examples yourself, we recommend cloning the repository:
```bash
git clone https://github.com/wang-q/tva.git
cd tva
```
Then you can run the commands exactly as shown in the docs (e.g.,
`tva select -f 1 docs/data/input.csv`).
Alternatively, you can download individual files from
the [docs/data](https://github.com/wang-q/tva/tree/master/docs/data) directory on GitHub.
## Commands
### [Subset Selection](docs/selection.md)
Select specific rows or columns from your data.
- **`select`**: Select and reorder columns.
- **`filter`**: Filter rows based on numeric, string, or regex conditions.
- **`slice`**: Slice rows by index (keep or drop). Supports multiple ranges and header preservation.
- **`sample`**: Randomly sample rows (Bernoulli, reservoir, weighted).
### [Data Transformation](docs/transformation.md)
Transform the structure or values of your data.
- **`longer`**: Reshape wide to long (unpivot). Requires a header row.
- **`wider`**: Reshape long to wide (pivot). Supports aggregation via `--op` (sum, count, etc.).
- **`fill`**: Fill missing values in selected columns (down/LOCF, const).
- **`blank`**: Replace consecutive identical values in selected columns with empty strings (
sparsify).
- **`transpose`**: Swaps rows and columns (matrix transposition).
### [Expr Language](docs/expr.md)
Expression-based transformations for complex data manipulation.
- **`expr`**: Evaluate expressions and output results.
- **`extend`**: Add new columns to each row (alias for `expr -m extend`).
- **`mutate`**: Modify existing column values (alias for `expr -m mutate`).
### [Data Organization](docs/organization.md)
Organize and combine multiple datasets.
- **`sort`**: Sorts rows based on one or more key fields.
- **`reverse`**: Reverses the order of lines (like `tac`), optionally keeping the header at the top.
- **`join`**: Join two files based on common keys.
- **`append`**: Concatenate multiple TSV files, handling headers correctly.
- **`split`**: Split a file into multiple files (by size, key, or random).
### [Statistics & Summary](docs/statistics.md)
Calculate statistics and summarize your data.
- **`stats`**: Calculate summary statistics (sum, mean, median, min, max, etc.) with grouping.
- **`bin`**: Discretize numeric values into bins (useful for histograms).
- **`uniq`**: Deduplicate rows or count unique occurrences (supports equivalence classes).
### [Visualization](docs/plot.md)
Visualize your data in the terminal.
- **`plot point`**: Draw scatter plots or line charts in the terminal.
- **`plot box`**: Draw box plots (box-and-whisker plots) in the terminal.
- **`plot bin2d`**: Draw 2D histograms/heatmaps in the terminal.
### [Formatting & Utilities](docs/utilities.md)
Format and validate your data.
- **`check`**: Validate TSV file structure (column counts, encoding).
- **`nl`**: Add line numbers to rows.
- **`keep-header`**: Run a shell command on the body of a TSV file, preserving the header.
### Import & Export
Convert data to and from TSV format.
- **[`from`](docs/from.md)**: Convert other formats to TSV (`csv`, `xlsx`, `html`).
- **[`to`](docs/to.md)**: Convert TSV to other formats (`csv`, `xlsx`, `md`).
## Author
Qiang Wang <wang-q@outlook.com>
## License
MIT.
Copyright by Qiang Wang.