pub struct TaxaToAgg {
pub scored: bool,
pub ranked_only: bool,
pub method: Method,
pub strategy: Strategy,
pub factor: f32,
pub lower_bound: f32,
pub taxon_file: PathBuf,
}
Expand description
Aggregates taxon IDs in a FASTA stream
The umgap taxa2agg
command takes one or more lists of taxon IDs and aggregates them into a
single consensus taxon.
The input is given in a FASTA format on standard input. Each FASTA record contains a list of taxon IDs, separated by newlines. The output is written to standard output, also in a FASTA format, each record containing a single taxon ID, which is the consensus taxon resulting from aggregation of the given list.
The taxonomy to be used is passed as an argument to this command. This is a preprocessed version of the NCBI taxonomy.
$ cat input.fa
>header1
571525
571525
6920
6920
1
6920
$ umgap taxa2agg taxons.tsv < input.fa
>header1
571525
By default, the aggregation used is the maximum root-to-leaf path (MRTL). A variant of the
lowest common ancestor (LCA*) aggregation is also available via the -a
and -m
options, as
is a hybrid approach.
-
-m rmq -a mrtl
is the default aggregation strategy. It selects the taxon from the given list which has the highest frequency of ancestors in the list (including its own frequency). A range-minimum-query (RMQ) algorithm is used. -
-m tree -a lca\*
returns the taxon (possibly not from the list) of lowest rank without contradicting taxa in the list. Non-contradicting taxa of a taxon are either itself, its ancestors and its descendants. A tree-based algorithm is used. -
-m tree -a hybrid
mixes the above two strategies, which results in a taxon which might have not have the highest frequency of ancestors in the list, but would have less contradicting taxa. Use the-f
option to select a hybrid close to the MRTL (-f 0.0
) or to the LCA (-f 1.0
).
Fields§
§scored: bool
Each taxon is followed by a score between 0 and 1
ranked_only: bool
Let all taxa snap to taxa with a named rank (such as species) during calculations
method: Method
The method to use for aggregation
strategy: Strategy
The strategy to use for aggregation
factor: f32
The factor for the hybrid aggregation, from 0.0 (MRTL) to 1.0 (LCA*)
lower_bound: f32
The smallest input frequency for a taxon to be included in the aggregation
taxon_file: PathBuf
An NCBI taxonomy TSV-file as processed by Unipept
Trait Implementations§
source§impl StructOpt for TaxaToAgg
impl StructOpt for TaxaToAgg
source§fn from_clap(matches: &ArgMatches<'_>) -> Self
fn from_clap(matches: &ArgMatches<'_>) -> Self
clap::ArgMatches
. It’s guaranteed to succeed
if matches
originates from an App
generated by StructOpt::clap
called on
the same type, otherwise it must panic.source§fn from_args() -> Selfwhere
Self: Sized,
fn from_args() -> Selfwhere
Self: Sized,
std::env::args_os
).
Calls clap::Error::exit
on failure, printing the error message and aborting the program.source§fn from_args_safe() -> Result<Self, Error>where
Self: Sized,
fn from_args_safe() -> Result<Self, Error>where
Self: Sized,
std::env::args_os
).
Unlike StructOpt::from_args
, returns clap::Error
on failure instead of aborting the program,
so calling .exit
is up to you.source§fn from_iter<I>(iter: I) -> Self
fn from_iter<I>(iter: I) -> Self
Vec
of your making.
Print the error message and quit the program in case of failure. Read more