Expand description
vareffect — Variant consequence prediction and HGVS notation, targeting
near-100% concordance with Ensembl VEP (release 115/116).
Consumers point the store loaders at whatever transcript and genome files
their build pipeline produces — vareffect ships no embedded reference
data and has no runtime dependency on an orchestrator CLI.
§Transcript model store
An in-memory store of MANE transcript models indexed by genomic interval
for O(log n + k) overlap queries. Each TranscriptModel carries
per-exon CdsSegments with the GFF3 column-8 phase captured, so
downstream codon walks and frameshift detection don’t have to re-derive
phase from scratch.
use std::path::Path;
use vareffect::{Biotype, TranscriptStore};
let store = TranscriptStore::load_from_path(
Path::new("data/vareffect/transcript_models.bin"),
)?;
// Overlap query: all transcripts whose tx_start..tx_end intersects the interval.
for (tx, _idx) in store.query_overlap("chr6", 33_409_450, 33_409_451) {
println!(
"{} ({}): cds [{:?}, {:?}), {} segments, biotype={:?}",
tx.accession,
tx.gene_symbol,
tx.cds_genomic_start,
tx.cds_genomic_end,
tx.cds_segments.len(),
tx.biotype,
);
// Walk CDS segments in transcript 5'→3' order (reversed for minus strand):
for seg in &tx.cds_segments {
println!(
" segment in exon[{}], phase {}: [{}, {})",
seg.exon_index, seg.phase, seg.genomic_start, seg.genomic_end
);
}
}
// `biotype` is an enum with `Other(String)` for unknown upstream labels.
let total_protein_coding = store
.transcripts()
.iter()
.filter(|t| matches!(t.biotype, Biotype::ProteinCoding))
.count();§Reference genome reader
Memory-mapped random access to the reference genome via FastaReader.
Pair it with TranscriptStore to extract codons, verify REF alleles, and
walk downstream for frameshift termination. See the fasta module for
the on-disk format, coordinate conventions, and chromosome-name handling.
The flat binary format stores uppercase IUPAC nucleotide codes, matching
GA4GH refget v2.0 conventions. Most bases are A/C/G/T/N; the
NCBI GRCh38.p14 assembly also uses ambiguity codes (M, R, Y, etc.)
in some patch-scaffold regions. Soft-mask information is not preserved.
§Variant consequence assignment
VarEffect::annotate takes a variant’s position and alleles, locates it
within every overlapping transcript, extracts the reference codon(s) from
FASTA, translates ref and alt codons, and assigns SO consequence term(s)
with VEP-concordant IMPACT ratings. The codon module provides the
standard and mitochondrial genetic code translation tables.
use std::path::Path;
use vareffect::VarEffect;
let ve = VarEffect::open(
Path::new("data/vareffect/transcript_models.bin"),
Path::new("data/vareffect/GRCh38.bin"),
)?;
// Annotate TP53 c.742C>T (p.R248W) — chr17, 0-based position 7,674,219.
let results = ve.annotate("chr17", 7_674_219, b"C", b"T")?;
for r in &results {
for csq in &r.consequences {
println!("{} ({})", csq.as_str(), r.impact);
}
}For lower-level building blocks (per-transcript annotation when you
already hold a &TranscriptModel), see annotate_snv,
annotate_deletion, and annotate_insertion.
§Coordinate convention
All coordinates in TranscriptModel are 0-based, half-open (BED/UCSC
style). GFF3 input (1-based, fully-closed) is converted at build time by
vareffect-cli. See transcript for the interval-tree indexing details.
cds_genomic_start / cds_genomic_end are the genomic min / max
coordinates across all CDS segments, not transcript-relative. For a
minus-strand gene, cds_genomic_start is biologically the 3’ end of the
protein in transcript order. Walk cds_segments (ordered 5’→3’ on the
transcript) when you need the true coding walk.
§Thread safety
Both TranscriptStore and FastaReader are Send + Sync (proven by
a compile-time assertion at the bottom of this file). TranscriptStore
is lock-free for reads. FastaReader is backed by a memory-mapped
&[u8] — inherently Send + Sync with zero contention. All threads
can read from the same FastaReader concurrently without cloning.
Re-exports§
pub use consequence::Consequence;pub use consequence::ConsequenceResult;pub use consequence::Impact;pub use consequence::annotate_deletion;pub use consequence::annotate_insertion;pub use consequence::annotate_snv;pub use error::VarEffectError;pub use fasta::FastaReader;pub use hgvs_reverse::GenomicVariant;pub use locate::IndelLocation;pub use locate::IndelRegion;pub use locate::LocateIndex;pub use locate::SpliceOverlapDetail;pub use locate::SpliceSide;pub use locate::VariantLocation;pub use locate::locate_indel;pub use locate::locate_variant;pub use transcript::TranscriptStore;pub use types::Biotype;pub use types::CdsSegment;pub use types::Exon;pub use types::Strand;pub use types::TranscriptModel;pub use types::TranscriptTier;
Modules§
- chrom
- Chromosome name conversion.
- codon
- Codon translation, DNA complement, and VEP-style display formatting.
- consequence
- Variant consequence assignment – determines SO consequence terms for SNVs, simple indels, boundary-spanning deletions, complex delins, and MNVs against transcript models.
- error
- Error type for
vareffect. - fasta
- Indexed reference genome reader backed by a flat memory-mapped binary.
- hgvs_c
- HGVS coding DNA notation (c. / n.) generation.
- hgvs_p
- HGVS protein notation (
p.) for variant consequences. - hgvs_
reverse - HGVS c. reverse mapper: parse HGVS c. notation and resolve to genomic VCF-style coordinates.
- locate
- Variant locator: classify where a genomic position falls within a transcript.
- transcript
- In-memory transcript store with per-chromosome interval tree indexing.
- types
- Runtime types for the transcript model store.
Structs§
- VarEffect
- Stateful entrypoint to vareffect: bundles a
TranscriptStoreand aFastaReaderso callers don’t have to thread both handles through every annotation call.