# Changelog
All notable changes to VarForge are documented here.
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
Version numbers follow semantic versioning: Z-bump for fixes and additions,
Y-bump for breaking changes.
---
## [0.2.0] — 2026-04-02
### Added
- **Sequencing error model (`ErrorOrchestrator`)**. A unified error-injection
pipeline replaces the old flat quality model. It composes all error sources
in a single pass: base substitutions, cycle-position decay, k-mer context
modifiers, sequencing indels, strand bias, and phasing bursts.
- **Cycle-position error rates**. Three cycle error curve models: `flat`
(uniform rate), `exponential` (tail rise), and `custom` (user-supplied TSV).
Configurable via `quality.sequencing_errors.cycle_error_model`.
- **k-mer context errors**. Context-dependent error multipliers keyed on
surrounding sequence context (k = 1–5). Inline rules or an external JSON
profile via `quality.sequencing_errors.kmer_length` and `context_rules`.
- **Sequencing indels**. Sequencing-level insertion and deletion errors
independent of somatic mutations, drawn from a geometric length distribution.
Configurable via `quality.sequencing_errors.indel_rate`.
- **Strand bias model**. Per-read error rate asymmetry between R1 and R2 via
`quality.sequencing_errors.r2_error_multiplier` and `r2_quality_offset`.
- **Correlated phasing burst errors**. Runs of correlated errors modelling
phasing failures, controlled by `quality.sequencing_errors.burst_rate` and
`burst_length_mean`.
- **ProfileLearner CIGAR-based indel extraction**. `varforge learn-profile`
now counts CIGAR `I` and `D` operations per cycle (MAPQ ≥ 30 only) and
exports `indel_error_profile` and `cycle_error_rates` fields to the profile
JSON. Loading these fields auto-configures the `ErrorOrchestrator`.
- **Three platform presets**: `illumina_novaseq`, `pacbio_hifi`, and
`nanopore_r10`. Each preset sets realistic error rates, indel rates, and
cycle models for the target platform.
### Bug fixes
- **`burst_length_mean` validation**. Values less than or equal to zero are
now rejected at config parse time with a descriptive error. Previously a
zero or negative value caused a geometric sampler panic at runtime.
- **Context error double-counting**. When both a cycle error curve and context
rules were configured, the context multiplier was applied twice: once when
computing the cycle rate and again when applying the context rule. The cycle
rate is now computed first and passed to the context stage unmodified, so
each error source is applied exactly once.
- **Spurious substitutions from `sequencing_error_config`**. A stale
`base_error_rate` field on the inner config struct was being applied as a
second independent substitution pass after the orchestrator had already
injected errors, producing roughly double the configured substitution rate.
The redundant field is now cleared before the orchestrator runs.
---
## [Unreleased] — v0.1.1
### Added
- **Inline UMI FASTQ and BAM output** (`umi.inline: true`). The UMI sequence is
prepended to the read sequence so tools such as fgbio `ExtractUmisFromBam` can
strip it without a custom read-name parser.
- **Inline UMI spacer** (`umi.spacer`). An optional fixed nucleotide sequence
appended after the inline UMI and before the template sequence. Matches the
Twist Biosciences AT-spacer layout and any other chemistry that places a
fixed adapter between the UMI and template.
- **Configurable duplex conversion rate** (`umi.duplex_conversion_rate`). Controls
the fraction of duplex molecules for which both the AB and BA strands are
recovered. The remainder produce only an AB-strand family, simulating real
library preparation losses. Default: 1.0 (all molecules yield both strands).
- **UMI sequencing error injection** (`umi.error_rate`). Injects random base-call
errors into UMI sequences at a configurable per-base rate. Produces near-miss
UMI families to test the error-correction tolerance of deduplication tools
(fgbio, HUMID, UMI-tools). Default: 0.0.
- **Updated Twist preset** (`--preset twist` / `examples/twist_duplex_benchmark.yaml`).
Now uses 5 bp UMI, AT spacer, 90 % duplex conversion rate, and 0.1 % UMI
error rate, matching the Twist Biosciences Comprehensive Exome Panel layout.
### Fixed
- BA strand `ref_seq` was written as the forward-strand sequence instead of
the reverse complement. This caused incorrect base qualities and mismatches
for BA-strand reads in BAM output. BA-strand reads now carry the correct
reverse-complemented reference sequence.
- BAM R2 soft-clip position was calculated from the wrong read end, shifting
the soft-clip CIGAR operation by the clip length. BAM R2 records now have
the soft-clip at the correct position.
---
## [0.1.0] — 2025-03-01
Initial public release.
### Features
- Somatic SNV, indel, and MNV simulation with configurable VAF ranges.
- Structural variants: DEL, INS, INV, DUP, TRA with HRD, TDP, and
CHROMOTHRIPSIS signatures.
- COSMIC SBS signature-weighted base selection.
- Tumour purity and multi-clone subclonal architecture.
- Paired tumour-normal simulation.
- Germline SNP/indel simulation with truth VCF output.
- cfDNA fragment model with mononucleosomal and dinucleosomal peaks and
end-motif rejection sampling.
- Long-read fragment model (log-normal sampler).
- Duplex UMI barcodes with PCR amplification families.
- FFPE deamination and OxoG artefact simulation.
- Hybrid-capture and amplicon enrichment model.
- GC bias model.
- Copy number alteration with depth scaling.
- Microsatellite instability (MSI) mode.
- Multi-sample / longitudinal series mode.
- Empirical quality profile learning from real BAM files.
- YAML configuration with variable substitution and CLI overrides.
- Named presets for common scenarios and cancer types.
- Truth VCF, BAM, FASTQ, and manifest outputs.
- Pure Rust dependency stack; single binary, no C libraries required.