genemancer 0.2.1

Rust CLI toolkit for niche optimized genomics file processing and target-based variant workflows
genemancer-0.2.1 is not a library.

genemancer banner

Genemancer

Rust Version Noodles WGPU Status Lifecycle

Genemancer is a Rust CLI toolkit for genomics file processing, built primarily on the noodles ecosystem, with optional GPU acceleration (wgpu) for target-based variant aggregation.

Toolkit

Current subcommands:

  • merge-bam (implemented): merge multiple coordinate-sorted, indexed BAM files into one BAM, with optional BED filtering (all|strict|trim), read-group filtering, output index writing, and configurable compression level.
  • gff-to-gtf (implemented): convert GFF3 annotations to GTF (stdin/stdout supported).
  • call-targets (implemented): call simple SNVs from BAM inputs over BED target intervals and write bgzipped VCF output (.vcf.gz) with index (csi default, optional tbi).
  • call-targets-gpu (implemented): same pipeline as call-targets, but attempts GPU initialization and falls back to CPU unless --require-gpu is set.
  • split-bam (stub): CLI and arguments are present, but the command currently exits with split_bam is not implemented yet.

Global options:

  • -v/--verbose (repeatable) for log verbosity.
  • -t/--threads to control worker threads.

Build And Run

  1. Install a Rust toolchain with edition 2024 support.
  2. Build:
    cargo build
    
  3. Show CLI help:
    cargo run -- --help
    

You can inspect any command with:

cargo run -- <subcommand> --help

Usage Examples

Examples below assume you provide your own inputs. In this repository, *.bam and /test_data are gitignored.

Merge two BAMs into one BAM with index output:

cargo run -- merge-bam \
  -i /path/to/input1.bam \
  -i /path/to/input2.bam \
  -o test_data/merged.bam \
  --index

Convert GFF3 to GTF:

cargo run -- gff-to-gtf \
  -i input.gff3 \
  -o output.gtf

Call SNVs on target regions (CPU/streaming path):

cargo run -- call-targets \
  -i /path/to/bams_or_directory \
  -r /path/to/reference.fa.gz \
  -T /path/to/targets.bed \
  --rg-map references/rg_map.txt \
  -o test_data/out.vcf.gz

Run the GPU-enabled path (falls back to CPU by default):

cargo run -- call-targets-gpu \
  -i /path/to/bams_or_directory \
  -r /path/to/reference.fa.gz \
  -T /path/to/targets.bed \
  --rg-map references/rg_map.txt \
  --gpu-backend auto \
  -o test_data/out.vcf.gz

Repository Data

  • references: helper scripts and a tracked sample RG map (references/rg_map.txt).
  • tests/data: tracked .bai files only.
  • Local working datasets are expected under test_data/ (ignored by git).

Ignored Paths

Notes

  • call-targets may prepare a sorted/indexed BGZF FASTA companion (*.sorted.fa.gz plus indexes) when the provided reference is not already in an indexed form suitable for random access.
  • split-bam is intentionally listed in help but not yet implemented.

TODO

Area Task Status Notes
split-bam Implement streaming region split pipeline TODO CLI exists; command currently returns not implemented
call-targets Add end-to-end integration tests on small fixture set TODO Validate VCF content + index generation
call-targets-gpu Expand GPU backend validation matrix TODO Cover Vulkan/Metal/DX12 fallback behavior
merge-bam Add CRAM input/output support TODO Current implementation is BAM-focused
Docs Add example outputs and expected file artifacts per command TODO Make quick verification easier for users