Bits

Struct Bits 

Source
pub struct Bits<I, T>
where I: PrimInt + Unsigned + Send + Sync, T: Eq + Clone + Send + Sync,
{ pub intervals: Vec<Interval<I, T>>, pub overlaps_merged: bool, /* private fields */ }
Expand description

A Binary Interval Search data structure for fast genomic interval overlap queries.

From the journal article: https://academic.oup.com/bioinformatics/article/29/1/1/273289

BITS (Binary Interval Search) is an efficient data structure for finding overlapping intervals using binary search. It maintains sorted lists of interval start and end positions, enabling fast identification of intervals that overlap with a query range.

§Examples

use gtars_overlaprs::{Bits, Overlapper, Interval};

// Create intervals for read alignments
let reads = vec![
    Interval { start: 100u32, end: 150, val: "read1" },
    Interval { start: 200, end: 250, val: "read2" },
    Interval { start: 225, end: 275, val: "read3" },
];

let bits = Bits::build(reads);

// Query for reads overlapping position 210-240
let overlaps = bits.find(210, 240);
assert_eq!(overlaps.len(), 2); // read2 and read3

// Count overlaps without allocating
let count = bits.count(210, 240);
assert_eq!(count, 2);

§Advanced Features

§Sequential Queries with seek

For sorted queries, use seek with a cursor for better performance:

use gtars_overlaprs::{Bits, Overlapper, Interval};

let intervals = (0u32..100).step_by(5)
    .map(|x| Interval { start: x, end: x + 2, val: true })
    .collect::<Vec<_>>();
let bits = Bits::build(intervals);

let mut cursor = 0;
for i in 10u32..20 {
    let overlaps: Vec<_> = bits.seek(i, i + 5, &mut cursor).collect();
    // Process overlaps...
}

§See Also

  • Overlapper - The trait that Bits implements
  • crate::AIList - An alternative implementation optimized for high-coverage regions

Fields§

§intervals: Vec<Interval<I, T>>

List of intervals

§overlaps_merged: bool

Whether or not overlaps have been merged

Implementations§

Source§

impl<I, T> Bits<I, T>
where I: PrimInt + Unsigned + Send + Sync, T: Eq + Clone + Send + Sync,

Source

pub fn insert(&mut self, elem: Interval<I, T>)

Insert a new interval after the BITS has been created. This is very inefficient and should be avoided if possible.

SIDE EFFECTS: This clears cov() and overlaps_merged meaning that those will have to be recomputed after a insert

use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;

let data : Vec<Interval<usize, usize>>= vec!{
    Interval{start:0,  end:5,  val:1},
    Interval{start:6,  end:10, val:2},
};
let mut bits = Bits::build(data);
bits.insert(Interval{start:0, end:20, val:5});
assert_eq!(bits.len(), 3);
assert_eq!(bits.find_iter(1,3).collect::<Vec<&Interval<usize,usize>>>(),
    vec![
        &Interval{start:0, end:5, val:1},
        &Interval{start:0, end:20, val:5},
    ]
);
Source

pub fn len(&self) -> usize

Get the number over intervals in Bits

Source

pub fn is_empty(&self) -> bool

Check if BITS is empty (i.e. has no intervals)

Source

pub fn iter(&self) -> IterBits<'_, I, T>

Return an iterator over the intervals in Bits

Source

pub fn lower_bound(start: I, intervals: &[Interval<I, T>]) -> usize

Determine the first index that we should start checking for overlaps for via a binary search. Assumes that the maximum interval length in intervals has been subtracted from start, otherwise the result is undefined

Source

pub fn bsearch_seq<K>(key: K, elems: &[K]) -> usize
where K: PartialEq + PartialOrd,

Binary search for the insertion position of a key in a sorted slice.

Returns the index where key should be inserted to maintain sort order. This is a convenience wrapper around bsearch_seq_ref.

§Arguments
  • key - The value to search for
  • elems - A sorted slice to search in
§Returns

The index where key should be inserted.

Source

pub fn bsearch_seq_ref<K>(key: &K, elems: &[K]) -> usize
where K: PartialEq + PartialOrd,

Binary search for the insertion position of a key reference in a sorted slice.

Returns the index where key should be inserted to maintain sort order. Uses an efficient binary search algorithm optimized for branch prediction.

§Arguments
  • key - A reference to the value to search for
  • elems - A sorted slice to search in
§Returns

The index where key should be inserted to maintain sort order:

  • 0 if the key should be inserted at the beginning
  • elems.len() if the key should be inserted at the end
  • Otherwise, the first index where elems[index] >= key
Source

pub fn count(&self, start: I, stop: I) -> usize

Count all intervals that overlap start .. stop. This performs two binary search in order to find all the excluded elements, and then deduces the intersection from there. See BITS for more details.

use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;

let bits = Bits::build((0..100).step_by(5)
                                .map(|x| Interval{start: x, end: x+2 , val: true})
                                .collect::<Vec<Interval<usize, bool>>>());
assert_eq!(bits.count(5, 11), 2);
Source

pub fn seek<'a>( &'a self, start: I, stop: I, cursor: &mut usize, ) -> IterFind<'a, I, T>

Find all intevals that overlap start .. stop. This method will work when queries to this Bits are in sorted (start) order. It uses a linear search from the last query instead of a binary search. A reference to a cursor must be passed in. This reference will be modified and should be reused in the next query. This allows seek to not need to make the Bits object mutable, and thus use the same Bits accross threads.

use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;

let bits = Bits::build((0..100).step_by(5)
                                .map(|x| Interval{start: x, end: x+2 , val: true})
                                .collect::<Vec<Interval<usize, bool>>>());
let mut cursor = 0;
for i in bits.iter() {
   assert_eq!(bits.seek(i.start, i.end, &mut cursor).count(), 1);
}

Trait Implementations§

Source§

impl<I, T> Clone for Bits<I, T>
where I: PrimInt + Unsigned + Send + Sync + Clone, T: Eq + Clone + Send + Sync + Clone,

Source§

fn clone(&self) -> Bits<I, T>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<I, T> Debug for Bits<I, T>
where I: PrimInt + Unsigned + Send + Sync + Debug, T: Eq + Clone + Send + Sync + Debug,

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<'a, I, T> IntoIterator for &'a Bits<I, T>
where T: Eq + Clone + Send + Sync + 'a, I: PrimInt + Unsigned + Send + Sync,

Source§

type Item = &'a Interval<I, T>

The type of the elements being iterated over.
Source§

type IntoIter = Iter<'a, Interval<I, T>>

Which kind of iterator are we turning this into?
Source§

fn into_iter(self) -> Iter<'a, Interval<I, T>>

Creates an iterator from a value. Read more
Source§

impl<'a, I, T> IntoIterator for &'a mut Bits<I, T>
where T: Eq + Clone + Send + Sync + 'a, I: PrimInt + Unsigned + Send + Sync,

Source§

type Item = &'a mut Interval<I, T>

The type of the elements being iterated over.
Source§

type IntoIter = IterMut<'a, Interval<I, T>>

Which kind of iterator are we turning this into?
Source§

fn into_iter(self) -> IterMut<'a, Interval<I, T>>

Creates an iterator from a value. Read more
Source§

impl<I, T> IntoIterator for Bits<I, T>
where T: Eq + Clone + Send + Sync, I: PrimInt + Unsigned + Send + Sync,

Source§

type Item = Interval<I, T>

The type of the elements being iterated over.
Source§

type IntoIter = IntoIter<<Bits<I, T> as IntoIterator>::Item>

Which kind of iterator are we turning this into?
Source§

fn into_iter(self) -> Self::IntoIter

Creates an iterator from a value. Read more
Source§

impl<I, T> Overlapper<I, T> for Bits<I, T>
where I: PrimInt + Unsigned + Send + Sync, T: Eq + Clone + Send + Sync,

Source§

fn build(intervals: Vec<Interval<I, T>>) -> Self
where Self: Sized,

Create a new instance of Bits by passing in a vector of Intervals. This vector will immediately be sorted by start order.

use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;

let data = (0..20).step_by(5)
                  .map(|x| Interval{start: x, end: x + 10, val: true})
                  .collect::<Vec<Interval<usize, bool>>>();
let bits = Bits::build(data);
Source§

fn find(&self, start: I, stop: I) -> Vec<Interval<I, T>>

Find all intervals that overlap start .. stop

use gtars_overlaprs::{Bits, Overlapper};
use gtars_core::models::Interval;

let bits = Bits::build((0..100).step_by(5)
                                .map(|x| Interval{start: x, end: x+2 , val: true})
                                .collect::<Vec<Interval<usize, bool>>>());
assert_eq!(bits.find_iter(5, 11).count(), 2);
Source§

fn find_iter<'a>( &'a self, start: I, stop: I, ) -> Box<dyn Iterator<Item = &'a Interval<I, T>> + 'a>

Returns an iterator over all intervals that overlap with the query range [start, end). Read more

Auto Trait Implementations§

§

impl<I, T> Freeze for Bits<I, T>
where I: Freeze,

§

impl<I, T> RefUnwindSafe for Bits<I, T>

§

impl<I, T> Send for Bits<I, T>

§

impl<I, T> Sync for Bits<I, T>

§

impl<I, T> Unpin for Bits<I, T>
where I: Unpin, T: Unpin,

§

impl<I, T> UnwindSafe for Bits<I, T>
where I: UnwindSafe, T: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.