
# Genemancer






Genemancer is a Rust CLI toolkit for genomics file processing, built primarily on the `noodles` ecosystem, with optional GPU acceleration (`wgpu`) for target-based variant aggregation.
## Toolkit
Current subcommands:
- `merge-bam` (implemented): merge multiple coordinate-sorted, indexed BAM files into one BAM, with optional BED filtering (`all|strict|trim`), read-group filtering, output index writing, and configurable compression level.
- `gff-to-gtf` (implemented): convert GFF3 annotations to GTF (stdin/stdout supported).
- `call-targets` (implemented): call simple SNVs from BAM inputs over BED target intervals and write bgzipped VCF output (`.vcf.gz`) with index (`csi` default, optional `tbi`).
- `call-targets-gpu` (implemented): same pipeline as `call-targets`, but attempts GPU initialization and falls back to CPU unless `--require-gpu` is set.
- `split-bam` (implemented): split one or more coordinate-sorted BAM files into per-region BAMs from a BED file, with optional unassigned-read output and optional output indexing.
Global options:
- `-v/--verbose` (repeatable) for log verbosity.
- `-t/--threads` to control worker threads.
## Installation
Install from crates.io:
```bash
cargo install genemancer
```
Or install from the local repository checkout:
```bash
cargo install --path .
```
## Build And Run
1. Install a Rust toolchain with edition 2024 support.
2. Build:
```bash
cargo build
```
3. Show CLI help:
```bash
cargo run -- --help
```
You can inspect any command with:
```bash
cargo run -- <subcommand> --help
```
## Usage Examples
Examples below assume you provide your own inputs. In this repository, `*.bam` and `/test_data` are gitignored.
Merge two BAMs into one BAM with index output:
```bash
cargo run -- merge-bam \
-i /path/to/input1.bam \
-i /path/to/input2.bam \
-o test_data/merged.bam \
--index
```
Convert GFF3 to GTF:
```bash
cargo run -- gff-to-gtf \
-i input.gff3 \
-o output.gtf
```
Call SNVs on target regions (CPU/streaming path):
```bash
cargo run -- call-targets \
-i /path/to/bams_or_directory \
-r /path/to/reference.fa.gz \
-T /path/to/targets.bed \
--rg-map references/rg_map.txt \
-o test_data/out.vcf.gz
```
Run the GPU-enabled path (falls back to CPU by default):
```bash
cargo run -- call-targets-gpu \
-i /path/to/bams_or_directory \
-r /path/to/reference.fa.gz \
-T /path/to/targets.bed \
--rg-map references/rg_map.txt \
--gpu-backend auto \
-o test_data/out.vcf.gz
```
Split multiple BAMs by BED regions into an output folder:
```bash
cargo run -- split-bam \
-i /path/to/input1.bam \
-i /path/to/input2.bam \
--bed /path/to/targets.bed \
--out-dir test_data/splits \
--output-prefix panel \
--write-indices \
--unassigned test_data/splits/unassigned.bam
```
## Repository Data
- `references`: helper scripts and a tracked sample RG map (`references/rg_map.txt`).
- `tests/data`: tracked `.bai` files only.
- Local working datasets are expected under `test_data/` (ignored by git).
## Ignored Paths
## Notes
- `call-targets` may prepare a sorted/indexed BGZF FASTA companion (`*.sorted.fa.gz` plus indexes) when the provided reference is not already in an indexed form suitable for random access.
## TODO
| `split-bam` | Add end-to-end fixture coverage for overlap edge cases | TODO | Validate multi-overlap and boundary behavior |
| `call-targets` | Add end-to-end integration tests on small fixture set | TODO | Validate VCF content + index generation |
| `call-targets-gpu` | Expand GPU backend validation matrix | TODO | Cover Vulkan/Metal/DX12 fallback behavior |
| `merge-bam` | Add CRAM input/output support | TODO | Current implementation is BAM-focused |
| Docs | Add example outputs and expected file artifacts per command | TODO | Make quick verification easier for users |