genemancer 0.2.2

Rust CLI toolkit for niche optimized genomics file processing and target-based variant workflows
![genemancer banner](images/genemancer_logo2.png)

# Genemancer

![Rust](https://img.shields.io/badge/Rust-edition%202024-orange?logo=rust)
![Version](https://img.shields.io/badge/version-0.2.2-blue)
![Noodles](https://img.shields.io/badge/noodles-powered-5c7cfa)
![WGPU](https://img.shields.io/badge/GPU-wgpu-2b8a3e)
![Status](https://img.shields.io/badge/commands-5%20implemented-brightgreen)
![Lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange)

Genemancer is a Rust CLI toolkit for genomics file processing, built primarily on the `noodles` ecosystem, with optional GPU acceleration (`wgpu`) for target-based variant aggregation.

## Toolkit

Current subcommands:

- `merge-bam` (implemented): merge multiple coordinate-sorted, indexed BAM files into one BAM, with optional BED filtering (`all|strict|trim`), read-group filtering, output index writing, and configurable compression level.
- `gff-to-gtf` (implemented): convert GFF3 annotations to GTF (stdin/stdout supported).
- `call-targets` (implemented): call simple SNVs from BAM inputs over BED target intervals and write bgzipped VCF output (`.vcf.gz`) with index (`csi` default, optional `tbi`).
- `call-targets-gpu` (implemented): same pipeline as `call-targets`, but attempts GPU initialization and falls back to CPU unless `--require-gpu` is set.
- `split-bam` (implemented): split one or more coordinate-sorted BAM files into per-region BAMs from a BED file, with optional unassigned-read output and optional output indexing.

Global options:

- `-v/--verbose` (repeatable) for log verbosity.
- `-t/--threads` to control worker threads.

## Installation

Install from crates.io:

```bash
cargo install genemancer
```

Or install from the local repository checkout:

```bash
cargo install --path .
```

## Build And Run

1. Install a Rust toolchain with edition 2024 support.
2. Build:
   ```bash
   cargo build
   ```
3. Show CLI help:
   ```bash
   cargo run -- --help
   ```

You can inspect any command with:

```bash
cargo run -- <subcommand> --help
```

## Usage Examples

Examples below assume you provide your own inputs. In this repository, `*.bam` and `/test_data` are gitignored.

Merge two BAMs into one BAM with index output:

```bash
cargo run -- merge-bam \
  -i /path/to/input1.bam \
  -i /path/to/input2.bam \
  -o test_data/merged.bam \
  --index
```

Convert GFF3 to GTF:

```bash
cargo run -- gff-to-gtf \
  -i input.gff3 \
  -o output.gtf
```

Call SNVs on target regions (CPU/streaming path):

```bash
cargo run -- call-targets \
  -i /path/to/bams_or_directory \
  -r /path/to/reference.fa.gz \
  -T /path/to/targets.bed \
  --rg-map references/rg_map.txt \
  -o test_data/out.vcf.gz
```

Run the GPU-enabled path (falls back to CPU by default):

```bash
cargo run -- call-targets-gpu \
  -i /path/to/bams_or_directory \
  -r /path/to/reference.fa.gz \
  -T /path/to/targets.bed \
  --rg-map references/rg_map.txt \
  --gpu-backend auto \
  -o test_data/out.vcf.gz
```

Split multiple BAMs by BED regions into an output folder:

```bash
cargo run -- split-bam \
  -i /path/to/input1.bam \
  -i /path/to/input2.bam \
  --bed /path/to/targets.bed \
  --out-dir test_data/splits \
  --output-prefix panel \
  --write-indices \
  --unassigned test_data/splits/unassigned.bam
```

## Repository Data

- `references`: helper scripts and a tracked sample RG map (`references/rg_map.txt`).
- `tests/data`: tracked `.bai` files only.
- Local working datasets are expected under `test_data/` (ignored by git).

## Ignored Paths

## Notes

- `call-targets` may prepare a sorted/indexed BGZF FASTA companion (`*.sorted.fa.gz` plus indexes) when the provided reference is not already in an indexed form suitable for random access.

## TODO

| Area | Task | Status | Notes |
| --- | --- | --- | --- |
| `split-bam` | Add end-to-end fixture coverage for overlap edge cases | TODO | Validate multi-overlap and boundary behavior |
| `call-targets` | Add end-to-end integration tests on small fixture set | TODO | Validate VCF content + index generation |
| `call-targets-gpu` | Expand GPU backend validation matrix | TODO | Cover Vulkan/Metal/DX12 fallback behavior |
| `merge-bam` | Add CRAM input/output support | TODO | Current implementation is BAM-focused |
| Docs | Add example outputs and expected file artifacts per command | TODO | Make quick verification easier for users |