Expand description
VarForge: synthetic cancer sequencing data generator.
Generates realistic FASTQ and BAM files with controlled mutations, tumour parameters, UMI tags, and cfDNA fragment profiles for benchmarking bioinformatics tools.
Modulesยง
- artifacts
- Sequencing artifact simulation: FFPE deamination, oxidative damage, and PCR duplicates.
- cli
- Command-line interface definitions for VarForge.
- core
- Core simulation primitives: types, coverage, fragment sampling, quality models, and the read engine.
- editor
- BAM editing engine for spiking variants into existing sequencing data.
- io
- Input and output: YAML config parsing, FASTQ and BAM writers, reference genome access, VCF reading and writing, and the simulation manifest.
- seq_
utils - Sequence utility functions shared across modules.
- tumour
- Tumour model: clonal tree construction and cancer cell fraction assignment.
- umi
- UMI (unique molecular identifier) support: barcode generation and PCR family simulation.
- variants
- Variant generation and spike-in: SNVs, indels, MNVs, SVs, CNVs, and mutational signatures.