tva 0.3.2

Tab-separated Values Assistant
Documentation
# tva: Tab-separated Values Assistant

Fast, reliable TSV processing toolkit in Rust.

[![Build](https://github.com/wang-q/tva/actions/workflows/build.yml/badge.svg)](https://github.com/wang-q/tva/actions)
[![codecov](https://img.shields.io/codecov/c/github/wang-q/tva/master)](https://app.codecov.io/gh/wang-q/tva/tree/master)
[![Crates.io](https://img.shields.io/crates/v/tva.svg)](https://crates.io/crates/tva)
[![license](https://img.shields.io/github/license/wang-q/tva)](https://github.com/wang-q/tva)
[![Documentation](https://img.shields.io/badge/docs-online-blue)](https://wang-q.github.io/tva/)

## Overview

`tva` (pronounced "Tee-Va") is a high-performance command-line toolkit written in **Rust** for
processing tabular data. It brings the safety and speed of modern systems programming to the classic
Unix philosophy.

**Inspiration**

* [eBay's tsv-utils]https://github.com/eBay/tsv-utils (discontinued): The primary reference for
  functionality and performance.
* [GNU Datamash]https://www.gnu.org/software/datamash/: Statistical operations.
* [R's tidyverse]https://tidyr.tidyverse.org/: Reshaping concepts and string manipulation.
* [xan]https://github.com/medialab/xan: DSL and terminal-based plotting.

**Use Cases**

* **"Middle Data"**: Files too large for Excel/Pandas but too small for distributed systems (
  Spark/Hadoop).
* **Data Pipelines**: Robust CLI-based ETL steps compatible with `awk`, `sort`, etc.
* **Exploration**: Fast summary statistics, sampling, and filtering on raw data.

**Design Principles**

* **Single Binary**: A standalone executable with no dependencies, easy to deploy.
* **Header Aware**: Manipulate columns by name or index.
* **Fail-fast**: Strict validation ensures data integrity (no silent truncation).
* **Streaming**: Stateless processing designed for infinite streams and large files.
* **TSV-first**: Prioritizes the reliability and simplicity of tab-separated values.
* **Performance**: Single-pass execution with minimal memory overhead.

**[Read the documentation online](https://wang-q.github.io/tva/)**

## Installation

Current release: 0.3.2

```bash
# Clone the repository and install via cargo
cargo install --force --path .
```

Or install the pre-compiled binary via the cross-platform package
manager [cbp](https://github.com/wang-q/cbp) (supports older Linux systems with glibc 2.17+):

```bash
cbp install tva
```

You can also download the pre-compiled binaries from
the [Releases](https://github.com/wang-q/tva/releases) page.

## Running Examples

The examples in the documentation use sample data located in the `docs/data/` directory. To run
these examples yourself, we recommend cloning the repository:

```bash
git clone https://github.com/wang-q/tva.git
cd tva
```

Then you can run the commands exactly as shown in the docs (e.g.,
`tva select -f 1 docs/data/input.csv`).

Alternatively, you can download individual files from
the [docs/data](https://github.com/wang-q/tva/tree/master/docs/data) directory on GitHub.

## Commands

### [Subset Selection]docs/selection.md

Select specific rows or columns from your data.

- **`select`**: Select and reorder columns.
- **`filter`**: Filter rows based on numeric, string, or regex conditions.
- **`slice`**: Slice rows by index (keep or drop). Supports multiple ranges and header preservation.
- **`sample`**: Randomly sample rows (Bernoulli, reservoir, weighted).

### [Data Transformation]docs/transformation.md

Transform the structure or values of your data.

- **`longer`**: Reshape wide to long (unpivot). Requires a header row.
- **`wider`**: Reshape long to wide (pivot). Supports aggregation via `--op` (sum, count, etc.).
- **`fill`**: Fill missing values in selected columns (down/LOCF, const).
- **`blank`**: Replace consecutive identical values in selected columns with empty strings (
  sparsify).
- **`transpose`**: Swaps rows and columns (matrix transposition).

### [Expr Language]docs/expr.md

Expression-based transformations for complex data manipulation.

- **`expr`**: Evaluate expressions and output results.
- **`extend`**: Add new columns to each row (alias for `expr -m extend`).
- **`mutate`**: Modify existing column values (alias for `expr -m mutate`).

### [Data Organization]docs/organization.md

Organize and combine multiple datasets.

- **`sort`**: Sorts rows based on one or more key fields.
- **`reverse`**: Reverses the order of lines (like `tac`), optionally keeping the header at the top.
- **`join`**: Join two files based on common keys.
- **`append`**: Concatenate multiple TSV files, handling headers correctly.
- **`split`**: Split a file into multiple files (by size, key, or random).

### [Statistics & Summary]docs/statistics.md

Calculate statistics and summarize your data.

- **`stats`**: Calculate summary statistics (sum, mean, median, min, max, etc.) with grouping.
- **`bin`**: Discretize numeric values into bins (useful for histograms).
- **`uniq`**: Deduplicate rows or count unique occurrences (supports equivalence classes).

### [Visualization]docs/plot.md

Visualize your data in the terminal.

- **`plot point`**: Draw scatter plots or line charts in the terminal.
- **`plot box`**: Draw box plots (box-and-whisker plots) in the terminal.
- **`plot bin2d`**: Draw 2D histograms/heatmaps in the terminal.

### [Formatting & Utilities]docs/utilities.md

Format and validate your data.

- **`check`**: Validate TSV file structure (column counts, encoding).
- **`nl`**: Add line numbers to rows.
- **`keep-header`**: Run a shell command on the body of a TSV file, preserving the header.

### Import & Export

Convert data to and from TSV format.

- **[`from`]docs/from.md**: Convert other formats to TSV (`csv`, `xlsx`, `html`).
- **[`to`]docs/to.md**: Convert TSV to other formats (`csv`, `xlsx`, `md`).

## Author

Qiang Wang <wang-q@outlook.com>

## License

MIT.
Copyright by Qiang Wang.