pub struct SeedExtend {
    pub min_seed_size: usize,
    pub max_gap_size: usize,
    pub ranked: Option<PathBuf>,
    pub penalty: usize,
}
Expand description

Selects promising regions in sequences of taxon IDs

The umgap seedextend command takes one or more sequences of taxon IDs and selects regions of consecutive predictions. It can be used to filter out accidental matches of incorrect taxa.

The input is given in a FASTA format on standard input. It should consist of taxon IDs separated by newlines, and the order of these taxa should reflect their location on a peptide, such as output by the umgap prot2kmer2lca -o command. As such, 3 consecutive equal IDs representing 9-mers, for instance, indicate a 11-mer match. This so-called seed could still be extended with other taxa, forming an extended seed. The command writes all taxa in any of these extended seeds to standard output.

$ cat dna.fa
>header1
CGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCCGGGCACTACAGGACCC
$ umgap translate -n -a < dna.fa | umgap prot2kmer2lca 9mer.index > input.fa
>header1|1
9606 9606 2759 9606 9606 9606 9606 9606 9606 9606 8287
>header1|2
2026807 888268 186802 1598 1883
>header1|3
1883
>header1|1R
27342 2759 155619 1133106 38033 2
>header1|2R
>header1|3R
2951
$ umgap seedextend < input.fa
>header1|1
9606 9606 2759 9606 9606 9606 9606 9606 9606 9606 8287
>header1|2
>header1|3
>header1|1R
>header1|2R
>header1|3R

Taxon IDs are separated by newlines in the actual output, but are separated by spaces in this example.

The number of consecutive equal IDs to start a seed is 2 by default, and can be changed using the -s option. The maximum length of gaps between seeds to join in an extension can be set with -g, no gaps are allowed by default.

The command can be altered to print only the extended seed with the highest score among all extended seeds. Pass a taxonomy using the -r taxon.tsv option to activate this. In this scored mode, extended seeds with gaps are given a penalty of 5, which can be made more or less severe (higher or lower) with the -p option.

Fields§

§min_seed_size: usize

The minimum length of equal taxa to count as seed

§max_gap_size: usize

The maximum length of a gap between seeds in an extension

§ranked: Option<PathBuf>

Use taxon ranks in given NCBI taxonomy tsv-file to pick extended seed with highest score

§penalty: usize

The score penalty for gaps in extended seeds

Trait Implementations§

source§

impl Debug for SeedExtend

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl StructOpt for SeedExtend

source§

fn clap<'a, 'b>() -> App<'a, 'b>

Returns clap::App corresponding to the struct.
source§

fn from_clap(matches: &ArgMatches<'_>) -> Self

Builds the struct from clap::ArgMatches. It’s guaranteed to succeed if matches originates from an App generated by StructOpt::clap called on the same type, otherwise it must panic.
source§

fn from_args() -> Self
where Self: Sized,

Builds the struct from the command line arguments (std::env::args_os). Calls clap::Error::exit on failure, printing the error message and aborting the program.
source§

fn from_args_safe() -> Result<Self, Error>
where Self: Sized,

Builds the struct from the command line arguments (std::env::args_os). Unlike StructOpt::from_args, returns clap::Error on failure instead of aborting the program, so calling .exit is up to you.
source§

fn from_iter<I>(iter: I) -> Self
where Self: Sized, I: IntoIterator, <I as IntoIterator>::Item: Into<OsString> + Clone,

Gets the struct from any iterator such as a Vec of your making. Print the error message and quit the program in case of failure. Read more
source§

fn from_iter_safe<I>(iter: I) -> Result<Self, Error>
where Self: Sized, I: IntoIterator, <I as IntoIterator>::Item: Into<OsString> + Clone,

Gets the struct from any iterator such as a Vec of your making. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.