Struct umgap::commands::prot2kmer2lca::ProtToKmerToLca
source · pub struct ProtToKmerToLca {
pub length: usize,
pub one_on_one: bool,
pub fst_file: PathBuf,
pub socket: Option<PathBuf>,
pub fst_in_memory: bool,
pub chunk_size: usize,
}
Expand description
Maps all k-mers from a FASTA stream of peptides to taxon IDs
The umgap prot2kmer2lca
command takes one or more peptides as input and outputs the lowest
common ancestors of all their k-mers. It is a combination of the umgap prot2kmer
and umgap pept2lca
commands to allow more efficient parallel computing.
The input is given in a FASTA format on standard input, with a single peptide per FASTA
header, which may be hardwrapped with newlines. All overlapping k-mers in these peptides (k
configurable via the -k
option, and 9 by default) are searched for in the index (as build by
the umgap buildindex
command) passed as argument. The results are printed on standard output
in FASTA format.
$ cat input.fa
>header1
DAIGDVAKAYKKAG*S
$ umgap prot2kmer2lca -k9 uniprot-2020-04-9mer.index < input.fa
>header1
571525
571525
6920
6920
1
6920
Add the -o
option to print out 0 for k-mers not found in the index.
$ umgap prot2kmer2lca -o uniprot-2020-04-9mer.index < input.fa
>header1
571525
571525
6920
6920
1
6920
0
0
This command also allows an alternative mode of operation. When memory mapped, it can take
some time for the index to be searched. With the -m
flag, the complete index will be loaded
in memory before operation. This, too, takes some time, but for a single large analysis, this
impact is irrelevant compared to the time of analysis. When processing many short files, the
index would need to be loaded again and again. Instead of using this command as part of a
pipeline, ... | umgap prot2kmer2lca index | ...
, it can run in a separate (and persistent)
process, reusing the same loaded index. Run umgap prot2kmer2lca -m -s umgap-socket index
as a
service, and when the index is loaded, change your original pipeline(s) to communicate with the
socket using OpenBSD’s netcat: ... | nc -NU /path/to/umgap-socket | ...
.
Fields§
§length: usize
The length of the k-mers in the index
one_on_one: bool
Map unknown sequences to 0 instead of ignoring them
fst_file: PathBuf
An index that maps k-mers to taxon IDs
socket: Option<PathBuf>
Instead of reading from stdin and writing to stdout, create a Unix
socket to communicate with using OpenBSD’s netcat (nc -NU <socket>
).
This is especially useful in combination with the --in-memory
flag:
you only have to load the index in memory once, after which you can
query it without having the loading time overhead each time.
fst_in_memory: bool
Load index in memory instead of memory mapping the file contents. This makes querying significantly faster, but requires some initialization time.
chunk_size: usize
Number of reads grouped into one chunk. Bigger chunks decrease the overhead caused by multithreading. Because the output order is not necessarily the same as the input order, having a chunk size which is a multiple of 12 (all 6 translations multiplied by the two paired-end reads) will keep FASTA records that originate from the same reads together.
Trait Implementations§
source§impl Debug for ProtToKmerToLca
impl Debug for ProtToKmerToLca
source§impl StructOpt for ProtToKmerToLca
impl StructOpt for ProtToKmerToLca
source§fn from_clap(matches: &ArgMatches<'_>) -> Self
fn from_clap(matches: &ArgMatches<'_>) -> Self
clap::ArgMatches
. It’s guaranteed to succeed
if matches
originates from an App
generated by StructOpt::clap
called on
the same type, otherwise it must panic.source§fn from_args() -> Selfwhere
Self: Sized,
fn from_args() -> Selfwhere
Self: Sized,
std::env::args_os
).
Calls clap::Error::exit
on failure, printing the error message and aborting the program.source§fn from_args_safe() -> Result<Self, Error>where
Self: Sized,
fn from_args_safe() -> Result<Self, Error>where
Self: Sized,
std::env::args_os
).
Unlike StructOpt::from_args
, returns clap::Error
on failure instead of aborting the program,
so calling .exit
is up to you.source§fn from_iter<I>(iter: I) -> Self
fn from_iter<I>(iter: I) -> Self
Vec
of your making.
Print the error message and quit the program in case of failure. Read more