barkit 0.1.1

BarKit — a cross-platform and ultrafast toolkit for barcodes manipulation in FASTQ files
Documentation
# BarKit

BarKit (**Bar**codes Tool**Kit**) is a toolkit designed for manipulating FASTQ barcodes.

## Installation

### From crates.io

Barkit can be installed from [`crates.io`](https://crates.io/crates/barkit) using `cargo`. This can be done with the following command:

```bash
cargo install barkit
```

### Build from source

1. Clone the repository:

```bash
git clone https://github.com/nsyzrantsev/barkit
cd barkit/
```

2. Build:

```bash
cargo build --release && sudo mv target/release/barkit /usr/local/bin/
```

## Extract subcommand

The extract subcommand is designed to parse barcode sequences from FASTQ reads using approximate regex matching based on a provided pattern.

All parsed barcode sequences are moved to the read header with base quality, separated by colons:

```
@SEQ_ID UMI:ATGC:???? CB:ATGC:???? SB:ATGC:????
```

* **UMI**: Unique Molecular Identifier (Molecular Barcode)
* **CB**: Cell Barcode
* **SB**: Sample Barcode


### Examples

Parse the first twelve nucleotides as a UMI from each forward read:

```bash
barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -p "^(?P<UMI>[ATGCN]{12})" -o <OUT_FASTQ1> -O <OUT_FASTQ2>
```

Parse the first sixteen nucleotides as a cell barcode from each reverse read before the `atgccat` adapter sequence:

```bash
barkit extract -1 <IN_FASTQ1> -2 <IN_FASTQ2> -P "^(?P<CB>[ATGCN]{16})atgccat" -o <OUT_FASTQ1> -O <OUT_FASTQ2>
```

> [!NOTE]
> Use lowercase letters for fuzzy match patterns.