pub struct PathIndex { /* private fields */ }Expand description
An index for random access to reference and generic paths in a GBZ graph.
Indexed paths are identified by their offsets in the index.
The offsets range from 0 to path_count() - 1.
The combination of GBZ and PathIndex is functionally similar to crate::GraphInterface but tens of times faster.
An in-memory graph is better for batch operations, where the loading time is a negligible fraction of the total time.
The database is better for interactive applications, where the user works with relatively small subgraphs.
For a human graph, the database should be faster than the in-memory graph for subgraphs of up to 1 Mbp.
§Examples
use gbz_base::PathIndex;
use gbz::{GBZ, FullPathName, Pos};
use gbz::support;
use simple_sds::serialize;
let filename = support::get_test_data("example.gbz");
let graph: GBZ = serialize::load_from(&filename).unwrap();
// Create a path index with 3 bp intervals.
let path_index = PathIndex::new(&graph, 3, false);
assert!(path_index.is_ok());
let path_index = path_index.unwrap();
// We have two components with one generic path in each.
assert_eq!(path_index.path_count(), 2);
assert_eq!(path_index.path_length(0), Some(5));
assert_eq!(path_index.path_length(1), Some(4));
// Consider the generic path in component A.
let path_name = FullPathName::generic("A");
let index_offset = path_index.find_path(&graph, &path_name);
assert_eq!(index_offset, Some(0));
let index_offset = index_offset.unwrap();
// There should be two indexed positions for the path.
let first_sample = path_index.indexed_position(index_offset, 2);
assert_eq!(first_sample, Some((0, Pos::new(22, 0))));
let second_sample = path_index.indexed_position(index_offset, 5);
assert_eq!(second_sample, Some((3, Pos::new(30, 0))));
let next_sample = path_index.indexed_position(index_offset, 100);
assert_eq!(next_sample, second_sample);Implementations§
Source§impl PathIndex
impl PathIndex
Sourcepub fn new(graph: &GBZ, interval: usize, verbose: bool) -> Result<Self, String>
pub fn new(graph: &GBZ, interval: usize, verbose: bool) -> Result<Self, String>
Creates a new path index for the given GBZ graph.
The index is built for all reference and generic paths.
§Arguments
graph: A GBZ graph.interval: Approximate distance between indexed positions (in bp).verbose: Print progress information to stderr.
Sourcepub fn path_count(&self) -> usize
pub fn path_count(&self) -> usize
Returns the number of indexed paths.
Sourcepub fn path_to_offset(&self, path_id: usize) -> Option<usize>
pub fn path_to_offset(&self, path_id: usize) -> Option<usize>
Returns the index offset for the indexed path with the given identifier.
Returns None if the path does not exist or it has not been indexed.
Sourcepub fn offset_to_path(&self, index_offset: usize) -> Option<usize>
pub fn offset_to_path(&self, index_offset: usize) -> Option<usize>
Returns the path identifier for the indexed path with the given index offset.
Returns None if the path does not exist.
Sourcepub fn find_path(&self, graph: &GBZ, path_name: &FullPathName) -> Option<usize>
pub fn find_path(&self, graph: &GBZ, path_name: &FullPathName) -> Option<usize>
Returns the index offset for the path with the given metadata.
Returns None if the path does not exist or it has not been indexed.
Sourcepub fn path_length(&self, index_offset: usize) -> Option<usize>
pub fn path_length(&self, index_offset: usize) -> Option<usize>
Returns the length of the indexed path with the given index offset.
Returns None if the path does not exist.
Sourcepub fn indexed_position(
&self,
index_offset: usize,
query_offset: usize,
) -> Option<(usize, Pos)>
pub fn indexed_position( &self, index_offset: usize, query_offset: usize, ) -> Option<(usize, Pos)>
Returns the last indexed position at or before offset on the path with name path_name.
The return value consists of a sequence offset and a GBWT position.
Returns None if the path does not exist or it has not been indexed.
This is similar to crate::GraphInterface::find_path followed by crate::GraphInterface::indexed_position.
§Arguments
index_offset: Offset of the path in the index.query_offset: Sequence position in the path (in bp).