pub struct Bits<I, T>{
pub intervals: Vec<Interval<I, T>>,
pub overlaps_merged: bool,
/* private fields */
}Expand description
A Binary Interval Search data structure for fast genomic interval overlap queries.
From the journal article: https://academic.oup.com/bioinformatics/article/29/1/1/273289
BITS (Binary Interval Search) is an efficient data structure for finding overlapping intervals using binary search. It maintains sorted lists of interval start and end positions, enabling fast identification of intervals that overlap with a query range.
§Examples
use gtars_overlaprs::{Bits, Overlapper, Interval};
// Create intervals for read alignments
let reads = vec![
Interval { start: 100u32, end: 150, val: "read1" },
Interval { start: 200, end: 250, val: "read2" },
Interval { start: 225, end: 275, val: "read3" },
];
let bits = Bits::build(reads);
// Query for reads overlapping position 210-240
let overlaps = bits.find(210, 240);
assert_eq!(overlaps.len(), 2); // read2 and read3
// Count overlaps without allocating
let count = bits.count(210, 240);
assert_eq!(count, 2);§Advanced Features
§Sequential Queries with seek
For sorted queries, use seek with a cursor for better performance:
use gtars_overlaprs::{Bits, Overlapper, Interval};
let intervals = (0u32..100).step_by(5)
.map(|x| Interval { start: x, end: x + 2, val: true })
.collect::<Vec<_>>();
let bits = Bits::build(intervals);
let mut cursor = 0;
for i in 10u32..20 {
let overlaps: Vec<_> = bits.seek(i, i + 5, &mut cursor).collect();
// Process overlaps...
}§See Also
Overlapper- The trait thatBitsimplementscrate::AIList- An alternative implementation optimized for high-coverage regions
Fields§
§intervals: Vec<Interval<I, T>>List of intervals
overlaps_merged: boolWhether or not overlaps have been merged
Implementations§
Source§impl<I, T> Bits<I, T>
impl<I, T> Bits<I, T>
Sourcepub fn insert(&mut self, elem: Interval<I, T>)
pub fn insert(&mut self, elem: Interval<I, T>)
Insert a new interval after the BITS has been created. This is very inefficient and should be avoided if possible.
SIDE EFFECTS: This clears cov() and overlaps_merged meaning that those will have to be recomputed after a insert
use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;
let data : Vec<Interval<usize, usize>>= vec!{
Interval{start:0, end:5, val:1},
Interval{start:6, end:10, val:2},
};
let mut bits = Bits::build(data);
bits.insert(Interval{start:0, end:20, val:5});
assert_eq!(bits.len(), 3);
assert_eq!(bits.find_iter(1,3).collect::<Vec<&Interval<usize,usize>>>(),
vec![
&Interval{start:0, end:5, val:1},
&Interval{start:0, end:20, val:5},
]
);
Sourcepub fn lower_bound(start: I, intervals: &[Interval<I, T>]) -> usize
pub fn lower_bound(start: I, intervals: &[Interval<I, T>]) -> usize
Determine the first index that we should start checking for overlaps for via a binary
search.
Assumes that the maximum interval length in intervals has been subtracted from
start, otherwise the result is undefined
Sourcepub fn bsearch_seq<K>(key: K, elems: &[K]) -> usizewhere
K: PartialEq + PartialOrd,
pub fn bsearch_seq<K>(key: K, elems: &[K]) -> usizewhere
K: PartialEq + PartialOrd,
Binary search for the insertion position of a key in a sorted slice.
Returns the index where key should be inserted to maintain sort order.
This is a convenience wrapper around bsearch_seq_ref.
§Arguments
key- The value to search forelems- A sorted slice to search in
§Returns
The index where key should be inserted.
Sourcepub fn bsearch_seq_ref<K>(key: &K, elems: &[K]) -> usizewhere
K: PartialEq + PartialOrd,
pub fn bsearch_seq_ref<K>(key: &K, elems: &[K]) -> usizewhere
K: PartialEq + PartialOrd,
Binary search for the insertion position of a key reference in a sorted slice.
Returns the index where key should be inserted to maintain sort order.
Uses an efficient binary search algorithm optimized for branch prediction.
§Arguments
key- A reference to the value to search forelems- A sorted slice to search in
§Returns
The index where key should be inserted to maintain sort order:
0if the key should be inserted at the beginningelems.len()if the key should be inserted at the end- Otherwise, the first index where
elems[index] >= key
Sourcepub fn count(&self, start: I, stop: I) -> usize
pub fn count(&self, start: I, stop: I) -> usize
Count all intervals that overlap start .. stop. This performs two binary search in order to find all the excluded elements, and then deduces the intersection from there. See BITS for more details.
use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;
let bits = Bits::build((0..100).step_by(5)
.map(|x| Interval{start: x, end: x+2 , val: true})
.collect::<Vec<Interval<usize, bool>>>());
assert_eq!(bits.count(5, 11), 2);Sourcepub fn seek<'a>(
&'a self,
start: I,
stop: I,
cursor: &mut usize,
) -> IterFind<'a, I, T> ⓘ
pub fn seek<'a>( &'a self, start: I, stop: I, cursor: &mut usize, ) -> IterFind<'a, I, T> ⓘ
Find all intevals that overlap start .. stop. This method will work when queries to this Bits are in sorted (start) order. It uses a linear search from the last query instead of a binary search. A reference to a cursor must be passed in. This reference will be modified and should be reused in the next query. This allows seek to not need to make the Bits object mutable, and thus use the same Bits accross threads.
use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;
let bits = Bits::build((0..100).step_by(5)
.map(|x| Interval{start: x, end: x+2 , val: true})
.collect::<Vec<Interval<usize, bool>>>());
let mut cursor = 0;
for i in bits.iter() {
assert_eq!(bits.seek(i.start, i.end, &mut cursor).count(), 1);
}Trait Implementations§
Source§impl<'a, I, T> IntoIterator for &'a Bits<I, T>
impl<'a, I, T> IntoIterator for &'a Bits<I, T>
Source§impl<'a, I, T> IntoIterator for &'a mut Bits<I, T>
impl<'a, I, T> IntoIterator for &'a mut Bits<I, T>
Source§impl<I, T> IntoIterator for Bits<I, T>
impl<I, T> IntoIterator for Bits<I, T>
Source§impl<I, T> Overlapper<I, T> for Bits<I, T>
impl<I, T> Overlapper<I, T> for Bits<I, T>
Source§fn build(intervals: Vec<Interval<I, T>>) -> Selfwhere
Self: Sized,
fn build(intervals: Vec<Interval<I, T>>) -> Selfwhere
Self: Sized,
Create a new instance of Bits by passing in a vector of Intervals. This vector will immediately be sorted by start order.
use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;
let data = (0..20).step_by(5)
.map(|x| Interval{start: x, end: x + 10, val: true})
.collect::<Vec<Interval<usize, bool>>>();
let bits = Bits::build(data);Source§fn find(&self, start: I, stop: I) -> Vec<Interval<I, T>>
fn find(&self, start: I, stop: I) -> Vec<Interval<I, T>>
Find all intervals that overlap start .. stop
use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;
let bits = Bits::build((0..100).step_by(5)
.map(|x| Interval{start: x, end: x+2 , val: true})
.collect::<Vec<Interval<usize, bool>>>());
assert_eq!(bits.find_iter(5, 11).count(), 2);