1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
//! # Rosalind — a deterministic, low-memory genomics engine
//!
//! Call variants across a whole genome on a laptop, with memory you can **predict
//! and verify**, and results that are **byte-for-byte reproducible**. Rosalind
//! treats memory as a *contract*: you declare a RAM budget, `rosalind plan` tells
//! you up front whether the job fits, the run honors it (fits-or-refuses cleanly —
//! never a silent OOM-kill), and `rosalind verify` re-checks a BLAKE3 receipt
//! proving the realized peak landed inside your budget.
//!
//! The kernel is a streaming, CIGAR-aware **pileup column stream** bounded by local
//! coverage, not input size — a substrate you can compute arbitrary per-locus
//! analytics on. Variant calling is the first consumer, not the whole product.
//!
//! ```
//! use std::sync::Arc;
//! use rosalind::{PileupEngine, PileupParams, SliceSource};
//! use rosalind::core::{AlignedRead, CigarOp, CigarOpKind, Position, SamFlags};
//!
//! // One 4bp read "ACGT" aligned at chr0:0 over the reference "ACGT".
//! let read = AlignedRead {
//! contig: 0,
//! pos: Position(0),
//! mapq: 60,
//! flags: SamFlags(0),
//! cigar: vec![CigarOp::new(CigarOpKind::Match, 4)],
//! seq: Arc::from(b"ACGT".to_vec().into_boxed_slice()),
//! qual: Arc::from(vec![40u8; 4].into_boxed_slice()),
//! };
//! let reference: Arc<[u8]> = Arc::from(b"ACGT".to_vec().into_boxed_slice());
//!
//! // The bounded pileup substrate: one PileupColumn per covered position.
//! let mut engine =
//! PileupEngine::new(SliceSource::new(vec![read]), reference, 0, 0..4, PileupParams::default());
//! let first = engine.next().unwrap().unwrap();
//! assert_eq!(first.depth(), 1);
//! ```
//!
//! ## Research direction (Phase D)
//!
//! Rosalind is also a research vehicle for **space-bounded genomics** — sublinear-space
//! index *construction* along a `~√t` space/time curve, extending the memory contract to the
//! index build itself (today's build is `O(reference)`). That is a direction, not yet shipped;
//! it is tracked in `docs/OPEN_PROBLEMS.md`.
// Each module is a layer of the genomics engine.
/// The calling layer: probabilistically-grounded, abstention-aware variant calls from pileup columns.
/// Core types: the lingua franca shared by every layer (io, index, align, pileup, call).
/// Genomics primitives: the FM-index, persisted memory-mapped index, alignment, sort, eval.
/// IO layer: spec-valid VCF writer + streaming FASTA/FASTQ/BAM readers.
/// The streaming pileup kernel: one CIGAR-aware, filtered, bounded-memory engine.
/// Reproducibility receipts: canonical-JSON BLAKE3 manifests for every run.
/// Extracted to the `rosalind-receipt` leaf crate (no htslib — wasm-friendly) and
/// re-exported here, so `rosalind::provenance::*` is unchanged.
pub use rosalind_receipt as provenance;
/// Third-party byte re-derivation from a receipt (the `reproduce` verb).
/// Helper utilities: read-only mmap + peak-RSS measurement.
// ── Genomics product surface — what builders compose on ───────────────────────
// The bounded streaming substrate:
pub use StreamingBamSource;
pub use ;
// The bounded whole-genome germline drive + calls:
pub use ;
// ColumnKit: implement one trait, inherit the bounded contract (SDK front door).
pub use ;
// The memory contract (declare → plan → honor → verify), incl. fleet packing:
pub use ;
pub use ;
// Build-once → mmap index + the reproducibility receipt:
pub use ;
pub use ;