[][src]Crate scailist

This module provides an implementation of an AIList, but with a dynamic scaling for the number of sublists.

Features

  • Consistantly fast. The way the input intervals are decomposed diminishes the effects of super containment.
  • Parallel friendly. Queries are on an immutable structure, even for seek
  • Consumer / Adapter paradigm, an iterator is returned.

Details:

Please see the paper.

Most interaction with this crate will be through the ScAIList struct The main methods is [find`](struct.ScAIList.html#method.find).

The overlap function for this assumes a zero based genomic coordinate system. So [start, stop) is not inclusive of the stop position for neither the queries, nor the Intervals.

ScAIList is composed of four primary parts. A main interval list, which holds all the intervals after they have been decomposed. A component index's list, which holds the start index of each sublist post-decomposition, A component lengths list, which holds the length of each component, and finally a max_ends list, which holds the max end releative to a sublist up to a given point for each interval.

The decomposition step is achieved by walking the list of intervals and recursively (with a cap) extracting intervals that overlap a given number of other intervals within a certain distance from it. The unique development in this implementation is to make the cap dynamic.

Examples

   use scailist::{Interval, ScAIList};
   use std::cmp;
   type Iv = Interval<u32>;

   // create some fake data
   let data: Vec<Iv> = (0..20).step_by(5).map(|x| Iv{start: x, end: x + 2, val: 0}).collect();
   println!("{:#?}", data);

   // make lapper structure
   let laps = ScAIList::new(data, None);
   assert_eq!(laps.find(6, 11).next(), Some(&Iv{start: 5, end: 7, val: 0}));
    
   let mut sim: u32= 0;
   // Calculate the overlap between the query and the found intervals, sum total overlap
   for i in (0..10).step_by(3) {
       sim += laps
           .find(i, i + 2)
           .map(|iv| cmp::min(i + 2, iv.end) - cmp::max(i, iv.start))
           .sum::<u32>();
   }
   assert_eq!(sim, 4);

Structs

Interval

Hold the start and stop of each sublist

IterFind

Find Iterator

IterScAIList

ScAIList Iterator

ScAIList

This is the main object of this repo, see associated methods