pub struct TaxaToAgg {
    pub scored: bool,
    pub ranked_only: bool,
    pub method: Method,
    pub strategy: Strategy,
    pub factor: f32,
    pub lower_bound: f32,
    pub taxon_file: PathBuf,
}
Expand description

Aggregates taxon IDs in a FASTA stream

The umgap taxa2agg command takes one or more lists of taxon IDs and aggregates them into a single consensus taxon.

The input is given in a FASTA format on standard input. Each FASTA record contains a list of taxon IDs, separated by newlines. The output is written to standard output, also in a FASTA format, each record containing a single taxon ID, which is the consensus taxon resulting from aggregation of the given list.

The taxonomy to be used is passed as an argument to this command. This is a preprocessed version of the NCBI taxonomy.

$ cat input.fa
>header1
571525
571525
6920
6920
1
6920
$ umgap taxa2agg taxons.tsv < input.fa
>header1
571525

By default, the aggregation used is the maximum root-to-leaf path (MRTL). A variant of the lowest common ancestor (LCA*) aggregation is also available via the -a and -m options, as is a hybrid approach.

  • -m rmq -a mrtl is the default aggregation strategy. It selects the taxon from the given list which has the highest frequency of ancestors in the list (including its own frequency). A range-minimum-query (RMQ) algorithm is used.

  • -m tree -a lca\* returns the taxon (possibly not from the list) of lowest rank without contradicting taxa in the list. Non-contradicting taxa of a taxon are either itself, its ancestors and its descendants. A tree-based algorithm is used.

  • -m tree -a hybrid mixes the above two strategies, which results in a taxon which might have not have the highest frequency of ancestors in the list, but would have less contradicting taxa. Use the -f option to select a hybrid close to the MRTL (-f 0.0) or to the LCA (-f 1.0).

Fields§

§scored: bool

Each taxon is followed by a score between 0 and 1

§ranked_only: bool

Let all taxa snap to taxa with a named rank (such as species) during calculations

§method: Method

The method to use for aggregation

§strategy: Strategy

The strategy to use for aggregation

§factor: f32

The factor for the hybrid aggregation, from 0.0 (MRTL) to 1.0 (LCA*)

§lower_bound: f32

The smallest input frequency for a taxon to be included in the aggregation

§taxon_file: PathBuf

An NCBI taxonomy TSV-file as processed by Unipept

Trait Implementations§

source§

impl Debug for TaxaToAgg

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl StructOpt for TaxaToAgg

source§

fn clap<'a, 'b>() -> App<'a, 'b>

Returns clap::App corresponding to the struct.
source§

fn from_clap(matches: &ArgMatches<'_>) -> Self

Builds the struct from clap::ArgMatches. It’s guaranteed to succeed if matches originates from an App generated by StructOpt::clap called on the same type, otherwise it must panic.
source§

fn from_args() -> Self
where Self: Sized,

Builds the struct from the command line arguments (std::env::args_os). Calls clap::Error::exit on failure, printing the error message and aborting the program.
source§

fn from_args_safe() -> Result<Self, Error>
where Self: Sized,

Builds the struct from the command line arguments (std::env::args_os). Unlike StructOpt::from_args, returns clap::Error on failure instead of aborting the program, so calling .exit is up to you.
source§

fn from_iter<I>(iter: I) -> Self
where Self: Sized, I: IntoIterator, <I as IntoIterator>::Item: Into<OsString> + Clone,

Gets the struct from any iterator such as a Vec of your making. Print the error message and quit the program in case of failure. Read more
source§

fn from_iter_safe<I>(iter: I) -> Result<Self, Error>
where Self: Sized, I: IntoIterator, <I as IntoIterator>::Item: Into<OsString> + Clone,

Gets the struct from any iterator such as a Vec of your making. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.