1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
//! `vareffect` — Variant consequence prediction and HGVS notation, targeting
//! near-100% concordance with Ensembl VEP (release 115/116).
//!
//! Consumers point the store loaders at whatever transcript and genome files
//! their build pipeline produces — `vareffect` ships no embedded reference
//! data and has no runtime dependency on an orchestrator CLI.
//!
//! # Transcript model store
//!
//! An in-memory store of MANE transcript models indexed by genomic interval
//! for O(log n + k) overlap queries. Each [`TranscriptModel`] carries
//! per-exon [`CdsSegment`]s with the GFF3 column-8 phase captured, so
//! downstream codon walks and frameshift detection don't have to re-derive
//! phase from scratch.
//!
//! ```no_run
//! use std::path::Path;
//! use vareffect::{Biotype, TranscriptStore};
//!
//! let store = TranscriptStore::load_from_path(
//! Path::new("data/vareffect/transcript_models.bin"),
//! )?;
//!
//! // Overlap query: all transcripts whose tx_start..tx_end intersects the interval.
//! for (tx, _idx) in store.query_overlap("chr6", 33_409_450, 33_409_451) {
//! println!(
//! "{} ({}): cds [{:?}, {:?}), {} segments, biotype={:?}",
//! tx.accession,
//! tx.gene_symbol,
//! tx.cds_genomic_start,
//! tx.cds_genomic_end,
//! tx.cds_segments.len(),
//! tx.biotype,
//! );
//!
//! // Walk CDS segments in transcript 5'→3' order (reversed for minus strand):
//! for seg in &tx.cds_segments {
//! println!(
//! " segment in exon[{}], phase {}: [{}, {})",
//! seg.exon_index, seg.phase, seg.genomic_start, seg.genomic_end
//! );
//! }
//! }
//!
//! // `biotype` is an enum with `Other(String)` for unknown upstream labels.
//! let total_protein_coding = store
//! .transcripts()
//! .iter()
//! .filter(|t| matches!(t.biotype, Biotype::ProteinCoding))
//! .count();
//! # let _ = total_protein_coding;
//! # Ok::<(), vareffect::VarEffectError>(())
//! ```
//!
//! # Reference genome reader
//!
//! Memory-mapped random access to the reference genome via [`FastaReader`].
//! Pair it with `TranscriptStore` to extract codons, verify REF alleles, and
//! walk downstream for frameshift termination. See the `fasta` module for
//! the on-disk format, coordinate conventions, and chromosome-name handling.
//!
//! The flat binary format stores uppercase IUPAC nucleotide codes, matching
//! GA4GH refget v2.0 conventions. Most bases are `A`/`C`/`G`/`T`/`N`; the
//! NCBI GRCh38.p14 assembly also uses ambiguity codes (`M`, `R`, `Y`, etc.)
//! in some patch-scaffold regions. Soft-mask information is not preserved.
//!
//! # Variant consequence assignment
//!
//! [`VarEffect::annotate`] takes a variant's position and alleles, locates it
//! within every overlapping transcript, extracts the reference codon(s) from
//! FASTA, translates ref and alt codons, and assigns SO consequence term(s)
//! with VEP-concordant IMPACT ratings. The [`codon`] module provides the
//! standard and mitochondrial genetic code translation tables.
//!
//! ```no_run
//! use std::path::Path;
//! use vareffect::VarEffect;
//!
//! let ve = VarEffect::open(
//! Path::new("data/vareffect/transcript_models.bin"),
//! Path::new("data/vareffect/GRCh38.bin"),
//! )?;
//!
//! // Annotate TP53 c.742C>T (p.R248W) — chr17, 0-based position 7,674,219.
//! let results = ve.annotate("chr17", 7_674_219, b"C", b"T")?;
//! for r in &results {
//! for csq in &r.consequences {
//! println!("{} ({})", csq.as_str(), r.impact);
//! }
//! }
//! # Ok::<(), vareffect::VarEffectError>(())
//! ```
//!
//! For lower-level building blocks (per-transcript annotation when you
//! already hold a `&TranscriptModel`), see [`annotate_snv`],
//! [`annotate_deletion`], and [`annotate_insertion`].
//!
//! # Coordinate convention
//!
//! All coordinates in [`TranscriptModel`] are **0-based, half-open** (BED/UCSC
//! style). GFF3 input (1-based, fully-closed) is converted at build time by
//! `vareffect-cli`. See [`transcript`] for the interval-tree indexing details.
//!
//! `cds_genomic_start` / `cds_genomic_end` are the genomic `min` / `max`
//! coordinates across all CDS segments, **not** transcript-relative. For a
//! minus-strand gene, `cds_genomic_start` is biologically the 3' end of the
//! protein in transcript order. Walk `cds_segments` (ordered 5'→3' on the
//! transcript) when you need the true coding walk.
//!
//! # Thread safety
//!
//! Both [`TranscriptStore`] and [`FastaReader`] are `Send + Sync` (proven by
//! a compile-time assertion at the bottom of this file). `TranscriptStore`
//! is lock-free for reads. `FastaReader` is backed by a memory-mapped
//! `&[u8]` — inherently `Send + Sync` with zero contention. All threads
//! can read from the same `FastaReader` concurrently without cloning.
pub
pub use ;
pub use VarEffectError;
pub use FastaReader;
pub use ;
pub use ;
pub use TranscriptStore;
pub use ;
pub use VarEffect;
// Compile-time proof that the runtime types are thread-safe to share across
// worker tasks. A field that silently breaks `Send + Sync` in a future
// refactor will fail this check at build time instead of at deploy time.
const _: fn = ;