ReadThreadingGraph

Struct ReadThreadingGraph 

Source
pub struct ReadThreadingGraph {
    pub reference_path: Vec<NodeIndex>,
    /* private fields */
}
Expand description

Note: not final but only intended to be subclassed for testing.

Fields§

§reference_path: Vec<NodeIndex>

Implementations§

Source§

impl ReadThreadingGraph

Source

pub const ANONYMOUS_SAMPLE: &'static str = "XXX_UNNAMED_XXX"

Source

pub const MAX_CIGAR_COMPLEXITY: usize = 3usize

Source

pub fn new( kmer_size: usize, _debug_graph_transformations: bool, min_base_quality_to_use_in_assembly: u8, min_pruning_samples: usize, min_matching_bases_to_dangling_end_recovery: i32, avx_mode: AVXMode, ) -> Self

Source

pub fn default_with_kmer_size(kmer_size: usize) -> Self

Source

pub fn determine_non_unique_kmers<'a>( seq_for_kmers: &SequenceForKmers<'a>, kmer_size: usize, ) -> Vec<Kmer>

Get the collection of non-unique kmers from sequence for kmer size kmerSize @param seqForKmers a sequence to get kmers from @param kmerSize the size of the kmers @return a non-null collection of non-unique kmers in sequence

Source

pub fn get_non_uniques(&mut self) -> &HashSet<Kmer>

Get the set of non-unique kmers in this graph. For debugging purposes @return a non-null set of kmers

Trait Implementations§

Source§

impl AbstractReadThreadingGraph for ReadThreadingGraph

Source§

fn is_threading_start(&self, kmer: &Kmer) -> bool

Checks whether a kmer can be the threading start based on the current threading start location policy.

@param kmer the query kmer. @return {@code true} if we can start thread the sequence at this kmer, {@code false} otherwise. @see #setThreadingStartOnlyAtExistingVertex(boolean)

Source§

fn is_low_quality_graph(&self) -> bool

Does the graph not have enough complexity? We define low complexity as a situation where the number of non-unique kmers is more than 20% of the total number of kmers.

@return true if the graph has low complexity, false otherwise

Source§

fn preprocess_reads<'a>( &mut self, pending: &LinkedHashMap<usize, Vec<SequenceForKmers<'a>>>, )

Since we want to duplicate non-unique kmers in the graph code we must determine what those kmers are

Source§

fn add_sequence<'a>( &self, pending: &mut LinkedHashMap<usize, Vec<SequenceForKmers<'a>>>, seq_name: String, sample_index: usize, sequence: &'a [u8], start: usize, stop: usize, count: usize, is_ref: bool, )

Add bases in sequence to this graph

@param seqName a useful seqName for this read, for debugging purposes @param sequence non-null sequence of bases @param start the first base offset in sequence that we should use for constructing the graph using this sequence, inclusive @param stop the last base offset in sequence that we should use for constructing the graph using this sequence, exclusive @param count the representative count of this sequence (to use as the weight) @param isRef is this the reference sequence.

Source§

fn add_read<'a>( &mut self, read: &'a BirdToolRead, _sample_names: &[String], count: &mut usize, pending: &mut LinkedHashMap<usize, Vec<SequenceForKmers<'a>>>, )

Add a read to the sequence graph. Finds maximal consecutive runs of bases with sufficient quality and applies {@see addSequence} to these subreads if they are longer than the kmer size.

@param read a non-null read

Source§

fn base_is_usable_for_assembly(&self, base: u8, qual: u8) -> bool

Determines whether a base can safely be used for assembly. Currently disallows Ns and/or those with low quality

@param base the base under consideration @param qual the quality of that base @return true if the base can be used for assembly, false otherwise

Source§

fn build_graph_if_necessary<'a>( &mut self, pending: &mut LinkedHashMap<usize, Vec<SequenceForKmers<'a>>>, )

Build the read threaded assembly graph if it hasn’t already been constructed from the sequences that have been added to the graph.

Source§

fn thread_sequence(&mut self, seq_for_kmers: &SequenceForKmers<'_>)

Thread sequence seqForKmers through the current graph, updating the graph as appropriate

@param seqForKmers a non-null sequence

Source§

fn find_start(&self, seq_for_kmers: &SequenceForKmers<'_>) -> Option<usize>

Find vertex and its position in seqForKmers where we should start assembling seqForKmers

@param seqForKmers the sequence we want to thread into the graph @return the position of the starting vertex in seqForKmer, or -1 if it cannot find one

Source§

fn get_or_create_kmer_vertex( &mut self, sequence: &[u8], start: usize, ) -> NodeIndex

Get the vertex for the kmer in sequence starting at start

@param sequence the sequence @param start the position of the kmer start @return a non-null vertex

Source§

fn get_kmer_vertex( &self, kmer: &Kmer, allow_ref_source: bool, ) -> Option<&NodeIndex>

Get the unique vertex for kmer, or null if not possible.

@param allowRefSource if true, we will allow kmer to match the reference source vertex @return a vertex for kmer, or null (either because it doesn’t exist or is non-unique for graphs that have such a distinction)

Source§

fn create_vertex(&mut self, sequence: &[u8], kmer: Kmer) -> NodeIndex

Create a new vertex for kmer. Add it to the kmerToVertexMap map if appropriate.

@param kmer the kmer we want to create a vertex for @return the non-null created vertex

Source§

fn extend_chain_by_one( &mut self, prev_vertex: NodeIndex, sequence: &[u8], kmer_start: usize, count: usize, is_ref: bool, ) -> NodeIndex

Workhorse routine of the assembler. Given a sequence whose last vertex is anchored in the graph, extend the graph one bp according to the bases in sequence.

@param prevVertex a non-null vertex where sequence was last anchored in the graph @param sequence the sequence we’re threading through the graph @param kmerStart the start of the current kmer in graph we’d like to add @param count the number of observations of this kmer in graph (can be > 1 for GGA) @param isRef is this the reference sequence? @return a non-null vertex connecting prevVertex to in the graph based on sequence

Source§

fn recover_dangling_tails( &mut self, prune_factor: usize, min_dangling_branch_length: i32, recover_all: bool, dangling_end_sw_parameters: &Parameters, )

Try to recover dangling tails

@param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param minDanglingBranchLength the minimum length of a dangling branch for us to try to merge it @param recoverAll recover even branches with forks @param aligner

Source§

fn recover_dangling_heads( &mut self, prune_factor: usize, min_dangling_branch_length: i32, recover_all: bool, dangling_head_sw_parameters: &Parameters, )

Try to recover dangling heads

@param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param minDanglingBranchLength the minimum length of a dangling branch for us to try to merge it @param recoverAll recover even branches with forks @param aligner

Source§

fn recover_dangling_tail( &mut self, vertex: NodeIndex, prune_factor: usize, min_dangling_branch_length: i32, recover_all: bool, dangling_tail_sw_parameters: &Parameters, ) -> usize

Attempt to attach vertex with out-degree == 0 to the graph

@param vertex the vertex to recover @param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param minDanglingBranchLength the minimum length of a dangling branch for us to try to merge it @param aligner @return 1 if we successfully recovered the vertex and 0 otherwise

Source§

fn recover_dangling_head( &mut self, vertex: NodeIndex, prune_factor: usize, min_dangling_branch_length: i32, recover_all: bool, dangling_head_sw_parameters: &Parameters, ) -> usize

Attempt to attach vertex with in-degree == 0, or a vertex on its path, to the graph

@param vertex the vertex to recover @param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param minDanglingBranchLength the minimum length of a dangling branch for us to try to merge it @param recoverAll recover even branches with forks @param aligner @return 1 if we successfully recovered a vertex and 0 otherwise

Source§

fn merge_dangling_tail( &mut self, dangling_tail_merge_result: DanglingChainMergeHelper, ) -> usize

Actually merge the dangling tail if possible

@param danglingTailMergeResult the result from generating a Cigar for the dangling tail against the reference @return 1 if merge was successful, 0 otherwise

Source§

fn merge_dangling_head_legacy( &mut self, dangling_head_merge_result: DanglingChainMergeHelper, ) -> usize

Actually merge the dangling head if possible, this is the old codepath that does not handle indels

@param danglingHeadMergeResult the result from generating a Cigar for the dangling head against the reference @return 1 if merge was successful, 0 otherwise

Source§

fn best_prefix_match_legacy( &self, path1: &[u8], path2: &[u8], max_index: usize, ) -> Option<usize>

Finds the index of the best extent of the prefix match between the provided paths, for dangling head merging. Assumes that path1.length >= maxIndex and path2.length >= maxIndex.

@param path1 the first path @param path2 the second path @param maxIndex the maximum index to traverse (not inclusive) @return the index of the ideal prefix match or -1 if it cannot find one, must be less than maxIndex

Source§

fn get_max_mismatches_legacy(&self, length_of_dangling_branch: usize) -> usize

NOTE: this method is only used for dangling heads and not tails.

Determine the maximum number of mismatches permitted on the branch. Unless it’s preset (e.g. by unit tests) it should be the length of the branch divided by the kmer size.

@param lengthOfDanglingBranch the length of the branch itself @return positive integer

Source§

fn merge_dangling_head( &mut self, dangling_head_merge_result: DanglingChainMergeHelper, ) -> usize

Actually merge the dangling head if possible

@param danglingHeadMergeResult the result from generating a Cigar for the dangling head against the reference @return 1 if merge was successful, 0 otherwise

Source§

fn best_prefix_match( &self, cigar_elements: &Vec<Cigar>, path1: &[u8], path2: &[u8], ) -> (i64, i64)

Finds the index of the best extent of the prefix match between the provided paths, for dangling head merging. Requires that at a minimum there are at least #getMinMatchingBases() matches between the reference and the read at the end in order to emit an alignment offset.

@param cigarElements cigar elements corresponding to the alignment between path1 and path2 @param path1 the first path @param path2 the second path @return an integer pair object where the key is the offset into path1 and the value is offset into path2 (both -1 if no path is found)

Source§

fn get_min_matching_bases(&self) -> i32

The minimum number of matches to be considered allowable for recovering dangling ends

Source§

fn generate_cigar_against_downwards_reference_path( &self, vertex: NodeIndex, prune_factor: usize, min_dangling_branch_length: i32, recover_all: bool, dangling_tail_sw_parameters: &Parameters, ) -> Option<DanglingChainMergeHelper>

Generates the CIGAR string from the Smith-Waterman alignment of the dangling path (where the provided vertex is the sink) and the reference path.

@param aligner @param vertex the sink of the dangling chain @param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param recoverAll recover even branches with forks @return a SmithWaterman object which can be null if no proper alignment could be generated

Source§

fn generate_cigar_against_upwards_reference_path( &self, vertex: NodeIndex, prune_factor: usize, min_dangling_branch_length: i32, recover_all: bool, dangling_head_sw_parameters: &Parameters, ) -> Option<DanglingChainMergeHelper>

Generates the CIGAR string from the Smith-Waterman alignment of the dangling path (where the provided vertex is the source) and the reference path.

@param aligner @param vertex the source of the dangling head @param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param recoverAll recover even branches with forks @return a SmithWaterman object which can be null if no proper alignment could be generated

Source§

fn find_path_downwards_to_highest_common_desendant_of_reference( &self, vertex: NodeIndex, prune_factor: usize, give_up_at_branch: bool, ) -> Option<Vec<NodeIndex>>

Finds the path downwards in the graph from this vertex to the reference sequence, including the highest common descendant vertex. However note that the path is reversed so that this vertex ends up at the end of the path. Also note that nodes are excluded if their pruning weight is less than the pruning factor.

@param vertex the original vertex @param pruneFactor the prune factor to use in ignoring chain pieces @param giveUpAtBranch stop trying to find a path if a vertex with multiple incoming or outgoing edge is found @return the path if it can be determined or null if this vertex either doesn’t merge onto the reference path or has a descendant with multiple outgoing edges before hitting the reference path

Source§

fn get_reference_path( &self, start: NodeIndex, direction: TraversalDirection, blacklisted_edge: Option<EdgeIndex>, ) -> Vec<NodeIndex>

Finds the path in the graph from this vertex to the reference sink, including this vertex

@param start the reference vertex to start from @param direction describes which direction to move in the graph (i.e. down to the reference sink or up to the source) @param blacklistedEdge edge to ignore in the traversal down; useful to exclude the non-reference dangling paths @return the path (non-null, non-empty)

Source§

fn find_path_upwards_to_lowest_common_ancestor( &self, vertex: NodeIndex, prune_factor: usize, give_up_at_branch: bool, ) -> Option<Vec<NodeIndex>>

Finds the path upwards in the graph from this vertex to the first diverging node, including that (lowest common ancestor) vertex. Note that nodes are excluded if their pruning weight is less than the pruning factor.

@param vertex the original vertex @param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param giveUpAtBranch stop trying to find a path if a vertex with multiple incoming or outgoing edge is found @return the path if it can be determined or null if this vertex either doesn’t merge onto another path or has an ancestor with multiple incoming edges before hitting the reference path

Source§

fn find_path( &self, vertex: NodeIndex, prune_factor: usize, done: &dyn Fn(NodeIndex) -> bool, return_path: &dyn Fn(NodeIndex) -> bool, next_edge: &dyn Fn(NodeIndex) -> EdgeIndex, next_node: &dyn Fn(EdgeIndex) -> NodeIndex, ) -> Option<Vec<NodeIndex>>

Finds a path starting from a given vertex and satisfying various predicates

@param vertex the original vertex @param pruneFactor the prune factor to use in ignoring chain pieces if edge multiplicity is < pruneFactor @param done test for whether a vertex is at the end of the path @param returnPath test for whether to return a found path based on its terminal vertex @param nextEdge function on vertices returning the next edge in the path @param nextNode function of edges returning the next vertex in the path @return a path, if one satisfying all predicates is found, {@code null} otherwise

Source§

fn get_bases_for_path( &self, path: &Vec<NodeIndex>, expand_source: bool, ) -> String

The base sequence for the given path.

@param path the list of vertexes that make up the path @param expandSource if true and if we encounter a source node, then expand (and reverse) the character sequence for that node @return non-null sequence of bases corresponding to the given path

Source§

fn get_base_graph(&self) -> &BaseGraph<MultiDeBruijnVertex, MultiSampleEdge>

Source§

fn get_base_graph_mut( &mut self, ) -> &mut BaseGraph<MultiDeBruijnVertex, MultiSampleEdge>

Source§

fn find_kmer(&self, kmer: &Kmer) -> Option<&NodeIndex>

Source§

fn set_increase_counts_through_branches( &mut self, increase_counts_through_branches: bool, )

Source§

fn set_min_matching_bases_to_dangling_end_recovery( &mut self, min_matching_bases: i32, )

Source§

fn has_cycles(&self) -> bool

Source§

fn get_next_kmer_vertex_for_chain_extension( &self, kmer: &Kmer, is_ref: bool, _prev_vertex: NodeIndex, ) -> Option<&NodeIndex>

Source§

fn should_remove_reads_after_graph_construction(&self) -> bool

Source§

fn set_threading_start_only_at_existing_vertex(&mut self, value: bool)

Changes the threading start location policy. Read more
Source§

fn print_graph(&self, file_name: String, _prune_factor: usize)

Source§

fn track_kmer(&mut self, kmer: Kmer, new_vertex: NodeIndex)

Source§

fn increase_counts_in_matched_kmers( &mut self, seqs_for_kmers: &SequenceForKmers<'_>, vertex: NodeIndex, original_kmer: &[u8], offset: Option<usize>, )

Source§

fn extend_dangling_path_against_reference( &mut self, dangling_head_merge_result: &mut DanglingChainMergeHelper, num_nodes_to_extend: usize, ) -> bool

Source§

fn has_incident_ref_edge(&self, v: NodeIndex) -> bool

Source§

fn get_heaviest_edge(&self, v: NodeIndex, direction: Direction) -> EdgeIndex

Source§

fn remove_paths_not_connected_to_ref(&mut self)

Source§

fn to_sequence_graph(&self) -> SeqGraph<BaseEdgeStruct>

Source§

fn get_kmer_size(&self) -> usize

Source§

fn get_reference_source_vertex(&self) -> Option<NodeIndex>

Source§

fn get_reference_sink_vertex(&self) -> Option<NodeIndex>

Source§

fn post_process_for_haplotype_finding<L: Locatable>( &mut self, _debug_graph_output_path: Option<&String>, _ref_haplotype: &L, )

Source§

fn cigar_is_okay_to_merge( cigar: &CigarString, require_first_element_m: bool, require_last_element_m: bool, ) -> bool

Determine whether the provided cigar is okay to merge into the reference path Read more
Source§

fn longest_suffix_match(seq: &[u8], kmer: &[u8], seq_start: i64) -> usize

calculates the longest suffix match between a sequence and a smaller kmer Read more
Source§

impl Clone for ReadThreadingGraph

Source§

fn clone(&self) -> ReadThreadingGraph

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for ReadThreadingGraph

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> Ungil for T
where T: Send,