# fastqrab [](https://tyberiusprime.github.io/fastqrab/main/docs/concepts/mascot/)
The multi-tool of FASTQ (pre-)processing.
It filters, samples, slices, dices, quantifies, demultiplexes and validates
FASTQ reads in any way you choose.
Define your own specific read transformation pipeline out of well-tested
building blocks, in a self-documenting, easily audited configuration file
format.
Read processing is fast, reliable and well tested (we have 100% test coverage
and more than 800 end-to-end test cases).
Supports input/output in FASTQ/FASTA/BAM.
## Getting started right away
### 1. Define temporary run command
`ABOVE="nix run github:TyberiusPrime/fastqrab"`
or
`ABOVE="docker run --rm ghcr.io/tyberiusprime/fastqrab:latest"`
### 2. Run Your First Pipeline
Generate a basic quality report configuration from our first example cookbook:
`$ABOVE cookbook 01 > my-first-pipeline.toml`
Edit the input section to point to your FASTQ files:
`nano my-first-pipeline.toml`:
Run it:
`$ABOVE my-first-pipeline.toml`
### 3. View your report
`xdg-open output_report.html`
## Documentation
We have [extensive documentation](https://tyberiusprime.github.io/fastqrab/main) following the Diátaxis framework.
Further examples can be found in the [cookbook section](https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/).
## Full list of FastQ manipulations supported
Please refer to the 'step' sections of our [reference
documentation](https://tyberiusprime.github.io/fastqrab/main/docs/reference/)
Briefly, you can
[extract information out of reads (into 'tags')](https://tyberiusprime.github.io/fastqrab/main/docs/reference/tag-steps/),
[filter reads](https://tyberiusprime.github.io/fastqrab/main/docs/reference/filter-steps/),
[modify their sequence and quality data](https://tyberiusprime.github.io/fastqrab/main/docs/reference/modification-steps/),
[validate them](https://tyberiusprime.github.io/fastqrab/main/docs/reference/validation-steps/),
[generate statistics on them](https://tyberiusprime.github.io/fastqrab/main/docs/reference/report-steps/),
and [split the output (demultiplex)](https://tyberiusprime.github.io/fastqrab/main/docs/reference/Demultiplex/).
## Status
It's in beta until the 1.0 release, but already quite usable.
All the major functionality and testing is in place, and I don't anticipate breaking changes.
## Installation
This repo is a [nix flake](https://nixos.wiki/wiki/flakes).
There are statically-linked binaries in the github releases section that will run on any linux with a recent enough glibc.
Currently not packaged by any distribution.
Windows and MacOS binaries are build for each release - be advised that these do not see much testing.
It's written in [rust](https://rust-lang.org/), so `cargo build --release`
should work as long as you have zstd and cmake around. Same goes for `cargo
install fastqrab`. The nix flake does offer
a fully reproducible build and development environment.
### Shell Completions
Shell completions are available for bash, fish, zsh, powershell, and elvish. After installation, generate completions for your shell:
```bash
# Bash - add to ~/.bashrc
source <(fastqrab completions bash)
# Fish - save to completions directory
fastqrab completions fish > ~/.config/fish/completions/fastqrab.fish
# Zsh - add to ~/.zshrc
source <(fastqrab completions zsh)
```
See the [CLI documentation](https://tyberiusprime.github.io/fastqrab/main/docs/reference/cli/) for more details.
### Container image
A ready-to-run OCI image is published with each tag at `ghcr.io/tyberiusprime/fastqrab`.
```bash
# Docker
docker pull ghcr.io/tyberiusprime/fastqrab:latest
docker run --rm ghcr.io/tyberiusprime/fastqrab:latest --help
# Podman
podman pull ghcr.io/tyberiusprime/fastqrab:latest
podman run --rm ghcr.io/tyberiusprime/fastqrab:latest --help
```
Mount your working directory to feed a pipeline configuration:
```bash
docker run --rm -v "$(pwd)":/work ghcr.io/tyberiusprime/fastqrab:latest process input.toml
```
## Usage
Refer to the [full documentation](https://tyberiusprime.github.io/fastqrab/) or the
binaries help page (shown when run without arguments) for details.
CLI: `fastqrab process input.toml`
We use a [TOML](https://toml.io/en/) file for configuration,
because command lines are too limited and prone to misunderstandings.
And you should be writing down what you are doing anyway.
Here's a brief example:
```toml
[input]
# supports multiple input files.
# in at least three autodetected formats.
read1 = ['fileA_1.fastq', 'fileB_1.fastq.gz', 'fileC_1.fastq.zstd']
read2 = ['fileA_2.fastq', 'fileB_2.fastq.gz', 'fileC_2.fastq.zstd']
index1 = ['index1_A.fastq', 'index1_B.fastq.gz', 'index1_C.fastq.zstd']
index2 = ['index2_A.fastq', 'index2_B.fastq.gz', 'index2_C.fastq.zstd']
[[step]]
# we can do a flexible report at any point in the pipeline
# filename is output.(html|json)
action = 'Report'
name = "initial"
duplicate_count_per_read = true
count = true
base_statistics = true
[[step]]
# take the first five thousand reads
action = "Head"
n = 5000
[[step]]
# extract UMI
action = "ExtractRegions"
out_label = "region"
# the umi is the first 8 bases of read1
regions = [{source = 'read1', start = 0, length = 8, anchor="Start"}]
[[step]]
#and place it in the read name
action = "StoreTagInComment"
in_label = "region"
[[step]]
# now remove the UMI from the read sequence
action = "CutStart"
segment = 'read1'
n = 8
[[step]]
action = "Report"
count = true # include read counts
name = "post_filter"
[output]
#generates output_1.fq and output_2.fq. For index reads see below.
prefix = "output"
# uncompressed. Suffix is determined from format
format = "FASTQ"
compression = "Raw"
report_json = true
report_html = true
```
### Canonical template
The repository ships an authoritative configuration scaffold at [`src/template.toml`](src/template.toml).
When prompting an LLM or drafting a new pipeline, point it to that file so it can reference
the full set of supported sections, comments, and examples.
### Cookbooks
Looking for practical examples? Check out the [`cookbooks/`](cookbooks/) directory for complete,
runnable examples demonstrating common use cases, or [visit them in the documentation](https://tyberiusprime.github.io/fastqrab/main/docs/how-to/cookbooks/):
- **Basic Quality Report** - Generate comprehensive quality metrics from FastQ files
- **UMI Extraction** - Extract and handle Unique Molecular Identifiers
- And many more...
Each cookbook includes:
- Sample input data
- Fully documented configuration files
- Expected output for verification
- Detailed README explaining the use case
Run any cookbook with:
```bash
git clone https://github.com/tyberiusprime/fastqrab
cd cookbooks/[cookbook-name]
fastqrab process input.toml
```
## Citations
A manuscript is being drafted.
## Contributions
PR's welcome.
If at any point you find the tool not doing what you expected it to,
please open an issue so we can discuss how to improve it!