# grab
`grab` is a high-performance, declarative stream processor for delimited text data.
It is designed to replace fragile shell pipelines (`awk`, `cut`, `sed`) with a structured approach for data extraction and manipulation. Instead of relying on complex, column-based syntax, `grab` allows you to define your data schema upfront: turning messy, brittle pipelines into readable, maintainable, and verifiable data flows.
## The UNIX Philosophy
`grab` is built to be a first-class citizen in the UNIX ecosystem. It adheres strictly to the principles of modularity and composability:
- **Everything is a stream**: `grab` reads from `stdin` and writes to `stdout`.
- **Composable by design**: Because it operates on streams, `grab` integrates seamlessly into existing pipelines. It works perfectly between your text sources and downstream processors like `jq` or `grep`.
- **Single responsibility**: It does one thing—transforming delimited text data—and does it well. It avoids "feature bloat" by focusing on high speed, type-safe processing.
- **Transparent failure**: `grab` communicates errors via `stderr` and uses standard exit codes. If a pipeline breaks, you know exactly where and why without digging through opaque error messages.
## Why `grab` vs. shell tools?
| **Logic** | Cryptic column indexing (e.g., `$1`) | Readable, named field mapping (e.g., `name`) |
| **Error Handling** | Silent failures or cryptic errors | Strict validation (opt-out available) with clear error messages |
| **Complexity** | Exponential regex/string logic | Declarative schema definition with built-in transformations |
## Mapping Syntax
| `name` | Maps the next input column to field `name` |
| `_:N` | Skips the next `N` input columns |
| `phones:N` | Maps the next `N` input columns to an array field `phones` |
| `data:g` | Maps the rest of the input columns to an array field `data` |
## Quick Start
To create JSON objects from a CSV file, you can use the following command:
```bash
# users.csv:
# 1,John,Doe,555-1234,555-5678,London,UK
# 2,Jane,Smith,555-8765,555-4321,New York,USA
grab --mapping id,_,last,phones:2,_:g --json < users.csv
# Output:
# {"id":"1","last":"Doe","phones":["555-1234","555-5678"]}
# {"id":"2","last":"Smith","phones":["555-8765","555-4321"]}
```
## Pipeline Integration
Filter for UK users and extract their IDs
```bash
When we actually start using `grab` as intended, mapping only the fields we care about and skipping the rest, the performance improves significantly. In this case, we achieve a throughput of 12.8 million fields per second (including skipped ones).
```bash
hyperfine --warmup 3 --runs 5 "./grab --mapping _:2,first_name,last_name,_:3,phones:2,email,_:g --skip 1 --json < .demo/2mil.csv > /dev/null"
# Results
# Time (mean ± σ): 1.864 s ± 0.010 s [User: 1.835 s, System: 0.029 s]
# Range (min … max): 1.852 s … 1.878 s 5 runs
# Throughput: 12.8 million fields/s
```
#### Note
While profiling, a significant portion of the execution time is spent on system calls and kernel-space I/O. `grab` often operates at the theoretical limit of the system pipe.
### TL;DR
| All columns with full schema validation | 7.6 million | 3.15s |
| Partial map + greedy skip | **12.8 million** | **1.86s** |
## Installation
### Binaries
Precompiled binaries for Linux are available on the releases page.
### Cargo
You can also install `grab` using Cargo:
```
cargo install grab-cli
```
### Source
To build from source, clone the repository and run:
```
cargo build --release
```
## Contributing
As of now, `grab` is in early development and not yet accepting contributions. However, if you're interested in contributing or have ideas for features, please reach out to me directly.