rust-lapper 0.4.2

A fast and easy interval overlap library
Documentation

docs crates.io

rust-lapper

Documentation

Crates.io

This is a rust port of Brent Pendersen's nim-lapper. It has a few notable differences, mostly that the find and seek methods both return iterators, so all adaptor methods may be used normally.

This crate works well for most interval data that does not include very long intervals that engulf a majority of other intervals. It is still fairly comporable to other methods, even in it's worst case. See benchmarks below. If you absolutely need more guarentees around worst case I would recommend AIList which handles large overlaps better at the cost of most normal cases.

However, on more typical datasets, this crate is between 4-10x faster than other interval overlap methods.

It should also be noted that the count method is agnostic to data type, and should be about as fast as it is possible to be on any dataset. It is an implementation of the BITS algorithm

Benchmarks

Benchmarking interval tree-ish datastructures is hard Please see the interval_bakeoff project for details on how the benchmarks were run... It's not fully baked yet though, and is finiky to run.

Command to run:

./target/release/interval_bakeoff fake -a -l ScAIList -l RustLapper -l
RustBio -l NestedInterval -n50000 -u100000

# This equates to the following params:
# num_intervals	50000
# universe_size	100000
# min_interval_size	500
# max_interval_size	80000
# add_large_span	true (universe spanning)

Set A / b Creation Times

crate/method A time B time
rust_lapper 15.625ms 31.25ms
scailist 15.625ms 15.625ms
nested_intervals 15.625ms 15.625ms
bio 15.625ms 31.25ms

100% hit rate (A vs A)

crate/method mean time intersection
rust_lapper/find 4.78125s 1469068763
rust_lapper/count 15.625ms 1469068763
scailist/find 4.640625s 1469068763
nested_intervals/query_overlapping 157.4375s 1469068763
bio/find 33.296875s 1469068763

Sub 100% hit rate (A vs B)

crate/method mean time intersection
rust_lapper/find 531.25ms 176488436
rust_lapper/count 15.625ms 176488436
scailist/find 671.875ms 176488436
nested_intervals/query_overlapping 11.109375s 196090092
bio/find 4.3125s 176488436

nested_intervals

rust-bio

ScAIList

Example

use rust_lapper::{Interval, Lapper};

type Iv = Interval<u32>;
fn main() {
    // create some fake data
    let data: Vec<Iv> = vec![
        Iv{start: 70, stop: 120, val: 0}, // max_len = 50
        Iv{start: 10, stop: 15, val: 0},
        Iv{start: 10, stop: 15, val: 0}, // exact overlap
        Iv{start: 12, stop: 15, val: 0}, // inner overlap
        Iv{start: 14, stop: 16, val: 0}, // overlap end
        Iv{start: 40, stop: 45, val: 0},
        Iv{start: 50, stop: 55, val: 0},
        Iv{start: 60, stop: 65, val: 0},
        Iv{start: 68, stop: 71, val: 0}, // overlap start
        Iv{start: 70, stop: 75, val: 0},
    ];

    // Make lapper structure
    let mut lapper = Lapper::new(data);
    
    // Iterator based find to extract all intervals that overlap 6..7
    // If your queries are coming in start sorted order, use the seek method to retain a cursor for
    // a big speedup.
    assert_eq!(
        lapper.find(11, 15).collect::<Vec<&Iv>>(), vec![
            &Iv{start: 10, stop: 15, val: 0},
            &Iv{start: 10, stop: 15, val: 0}, // exact overlap
            &Iv{start: 12, stop: 15, val: 0}, // inner overlap
            &Iv{start: 14, stop: 16, val: 0}, // overlap end
        ]
    );

    // Merge overlapping regions within the lapper to simlify and speed up queries that only depend
    // on 'any' overlap and not how many
    lapper.merge_overlaps();
    assert_eq!(
        lapper.find(11, 15).collect::<Vec<&Iv>>(), vec![
            &Iv{start: 10, stop: 16, val: 0},
        ]
    );

    // Get the number of positions covered by the lapper tree:
    assert_eq!(lapper.cov(), 73);

    // Get the union and intersect of two different lapper trees
    let data = vec![
        Iv{start: 5, stop: 15, val: 0},
        Iv{start: 48, stop: 80, val: 0},
    ];
    let (union, intersect) = lapper.union_and_intersect(&Lapper::new(data));
    assert_eq!(union, 88);
    assert_eq!(intersect, 27);
}

Release Notes

-0.4.0: Addition of the BITS count algorithm. -0.4.2: Bugfix in to update starts/stops vectors when overlaps merged