pub struct ReadSet { /* private fields */ }Expand description
A set of reads extracted from GAFBase.
This is a counterpart to Subgraph.
Sets of reads fully contained in a subgraph or overlapping with it can be created using ReadSet::new.
The reads can be iterated over with ReadSet::iter and converted to GAF lines with ReadSet::to_gaf.
The reads will appear in the same order as in the database.
§Examples
use gbz_base::{Subgraph, SubgraphQuery, HaplotypeOutput};
use gbz_base::{GAFBase, GAFBaseParams, ReadSet, AlignmentOutput, GraphReference};
use gbz_base::utils;
use gbz::GBZ;
use simple_sds::serialize;
// Get an in-memory graph.
let gbz_file = utils::get_test_data("micb-kir3dl1.gbz");
let graph = serialize::load_from(&gbz_file).unwrap();
// Extract a 100 bp subgraph around node 150.
let nodes = vec![150];
let query = SubgraphQuery::nodes(nodes).with_output(HaplotypeOutput::Distinct);
let mut subgraph = Subgraph::new();
let _ = subgraph.from_gbz(&graph, None, None, &query).unwrap();
// Create a database of reads aligned to the graph.
let gaf_file = utils::get_test_data("micb-kir3dl1_HG003.gaf");
let gbwt_file = None; // Build a new GBWT index.
let db_file = serialize::temp_file_name("gaf-base");
let graph_ref = GraphReference::None; // Do not store sequences in the database.
let params = GAFBaseParams::default();
let db = GAFBase::create_from_files(&gaf_file, gbwt_file, &db_file, graph_ref, ¶ms);
assert!(db.is_ok());
// Extract all reads fully within the subgraph.
let db = GAFBase::open(&db_file);
assert!(db.is_ok());
let db = db.unwrap();
let read_set = ReadSet::new(GraphReference::Gbz(&graph), &subgraph, &db, AlignmentOutput::Contained);
assert!(read_set.is_ok());
let read_set = read_set.unwrap();
assert_eq!(read_set.len(), 148);
// The extracted reads are aligned and fully within the subgraph.
for aln in read_set.iter() {
for handle in aln.target_path().unwrap() {
assert!(subgraph.has_handle(*handle));
}
}
drop(db);
let _ = std::fs::remove_file(&db_file);Implementations§
Source§impl ReadSet
impl ReadSet
Sourcepub const CLUSTER_GAP_THRESHOLD: usize = 1000
pub const CLUSTER_GAP_THRESHOLD: usize = 1000
Gap length threshold for clustering node ids.
Sourcepub fn new(
graph: GraphReference<'_, '_>,
subgraph: &Subgraph,
database: &GAFBase,
output: AlignmentOutput,
) -> Result<Self, String>
pub fn new( graph: GraphReference<'_, '_>, subgraph: &Subgraph, database: &GAFBase, output: AlignmentOutput, ) -> Result<Self, String>
Extracts a set of reads overlapping with the subgraph.
The extracted reads will be in the same order as in the database. That corresponds to the order in the original GAF file.
§Arguments
graph: A GBZ-compatible graph for querying a reference-based GAF-base, or no graph for a reference-free one.subgraph: The subgraph used as the query region.database: A database storing reads aligned to the graph.output: Which reads to include in the read set.
§Errors
Passes through any database errors. Returns an error if an alignment cannot be decompressed.
Sourcepub fn from_rows(
database: &GAFBase,
row_range: Range<usize>,
graph: Option<&GBZ>,
) -> Result<Self, String>
pub fn from_rows( database: &GAFBase, row_range: Range<usize>, graph: Option<&GBZ>, ) -> Result<Self, String>
Extracts all reads from the given range of row ids.
The extracted reads will be in the same order as in the database. That corresponds to the order in the original GAF file.
§Arguments
database: A database storing reads aligned to the graph.row_range: The range of row ids to extract.graph: A GBZ graph if the database is reference-based, orNonefor a reference-free one.
§Errors
Passes through any database errors. Returns an error if an alignment cannot be decompressed.
Sourcepub fn unclipped(&self) -> usize
pub fn unclipped(&self) -> usize
Returns the original number of alignments (before clipping) in the set.
Sourcepub fn blocks(&self) -> usize
pub fn blocks(&self) -> usize
Returns the number of alignment blocks decompressed when creating the read set.
Sourcepub fn node_records(&self) -> usize
pub fn node_records(&self) -> usize
Returns the number of node records in the read set.
Each record corresponds to an oriented node, and the opposite orientation may not be present. This includes all node records encountered while tracing the alignments, even when the alignment was not included in the read set.
Sourcepub fn iter(&self) -> impl Iterator<Item = &Alignment>
pub fn iter(&self) -> impl Iterator<Item = &Alignment>
Returns an iterator over the reads in the set.
Sourcepub fn to_gaf<W: Write>(&self, writer: &mut W) -> Result<(), String>
pub fn to_gaf<W: Write>(&self, writer: &mut W) -> Result<(), String>
Serializes the read set in the GAF format.
The output does not include any header lines, as the GAF file may consist of multiple read sets. Returns an error if the target sequence for a read is invalid or cannot be determined. Passes through any I/O errors.