gtars_overlaprs/lib.rs
1//! Core infrastructure for high-performance genomic interval overlap operations in Rust.
2//!
3//! This crate provides efficient data structures and algorithms for finding overlapping intervals
4//! in genomic data. It is part of the [gtars](https://github.com/databio/gtars) project, which
5//! provides tools for working with genomic interval data in Rust, Python, and R.
6//!
7//! ## Features
8//!
9//! - **Fast overlap queries**: Efficiently find all intervals that overlap with a query interval
10//! - **Iterator-based API**: Memory-efficient iteration over overlapping intervals
11//! - **Thread-safe**: All data structures implement `Send` and `Sync` for concurrent access
12//!
13//! All overlap computation logic should live here. Higher-level modules (scoring, tokenizers)
14//! wrap this functionality for their specific use cases but should not reimplement overlap
15//! algorithms.
16//! ## Quick Start
17//!
18//! ```rust
19//! use gtars_overlaprs::{AIList, Overlapper, Interval};
20//!
21//! // create some genomic intervals (e.g., ChIP-seq peaks)
22//! let intervals = vec![
23//! Interval { start: 100u32, end: 200, val: "gene1" },
24//! Interval { start: 150, end: 300, val: "gene2" },
25//! Interval { start: 400, end: 500, val: "gene3" },
26//! ];
27//!
28//! // build the AIList data structure
29//! let ailist = AIList::build(intervals);
30//!
31//! // query for overlapping intervals
32//! let overlaps = ailist.find(180, 250);
33//! assert_eq!(overlaps.len(), 2); // gene1 and gene2 overlap
34//!
35//! // or use an iterator for memory-efficient processing
36//! for interval in ailist.find_iter(180, 250) {
37//! println!("Found overlap: {:?}", interval);
38//! }
39//! ```
40//!
41//! ## Performance
42//!
43//! The [`AIList`] data structure is optimized for queries on genomic-scale datasets and provides
44//! excellent performance for typical genomic interval overlap operations. It uses a decomposition
45//! strategy to handle intervals efficiently, particularly when dealing with high-coverage regions
46//! common in genomic data.
47//!
48//! ## Examples
49//!
50//! ### Finding all genes that overlap a query region
51//!
52//! ```rust
53//! use gtars_overlaprs::{AIList, Overlapper, Interval};
54//!
55//! let genes = vec![
56//! Interval { start: 1000u32, end: 2000, val: "BRCA1" },
57//! Interval { start: 3000, end: 4000, val: "TP53" },
58//! Interval { start: 5000, end: 6000, val: "EGFR" },
59//! ];
60//!
61//! let gene_index = AIList::build(genes);
62//!
63//! // query a specific region (e.g., chr17:1500-3500)
64//! let overlapping_genes: Vec<&str> = gene_index
65//! .find_iter(1500, 3500)
66//! .map(|interval| interval.val)
67//! .collect();
68//!
69//! println!("Genes in region: {:?}", overlapping_genes);
70//! ```
71
72/// Augmented Interval List implementation.
73///
74/// See [`AIList`] for details.
75pub mod ailist;
76
77/// Binary Interval Search implementation.
78///
79/// See [`Bits`] for details.
80pub mod bits;
81
82/// Genome-wide interval indexing.
83///
84/// See the [`genome_index`] module for details.
85pub mod multi_chrom_overlapper;
86
87/// Core traits for overlap operations.
88///
89/// See [`Overlapper`] for the main trait.
90pub mod traits;
91
92// re-exports
93pub use self::ailist::AIList;
94pub use self::bits::Bits;
95pub use self::traits::{Interval, Overlapper};
96
97/// The type of overlap data structure to use.
98///
99/// This enum allows you to choose between different overlap query implementations,
100/// each with different performance characteristics.
101///
102/// # Variants
103///
104/// * `AIList` - Use the Augmented Interval List implementation. Best for genomic data
105/// with high-coverage regions (e.g., ChIP-seq peaks, dense annotations).
106/// * `Bits` - Use the Binary Interval Search implementation. Best for general-purpose
107/// overlap queries and sorted sequential queries.
108///
109pub enum OverlapperType {
110 /// Use the Augmented Interval List implementation.
111 AIList,
112 /// Use the Binary Interval Search implementation.
113 Bits,
114}
115
116/// Constants used throughout the crate.
117pub mod consts {
118 /// The command name for overlap operations.
119 pub const OVERLAP_CMD: &str = "overlap";
120}