pub struct ProtToKmerToLca {
    pub length: usize,
    pub one_on_one: bool,
    pub fst_file: PathBuf,
    pub socket: Option<PathBuf>,
    pub fst_in_memory: bool,
    pub chunk_size: usize,
}
Expand description

Maps all k-mers from a FASTA stream of peptides to taxon IDs

The umgap prot2kmer2lca command takes one or more peptides as input and outputs the lowest common ancestors of all their k-mers. It is a combination of the umgap prot2kmer and umgap pept2lca commands to allow more efficient parallel computing.

The input is given in a FASTA format on standard input, with a single peptide per FASTA header, which may be hardwrapped with newlines. All overlapping k-mers in these peptides (k configurable via the -k option, and 9 by default) are searched for in the index (as build by the umgap buildindex command) passed as argument. The results are printed on standard output in FASTA format.

$ cat input.fa
>header1
DAIGDVAKAYKKAG*S
$ umgap prot2kmer2lca -k9 uniprot-2020-04-9mer.index < input.fa
>header1
571525
571525
6920
6920
1
6920

Add the -o option to print out 0 for k-mers not found in the index.

$ umgap prot2kmer2lca -o uniprot-2020-04-9mer.index < input.fa
>header1
571525
571525
6920
6920
1
6920
0
0

This command also allows an alternative mode of operation. When memory mapped, it can take some time for the index to be searched. With the -m flag, the complete index will be loaded in memory before operation. This, too, takes some time, but for a single large analysis, this impact is irrelevant compared to the time of analysis. When processing many short files, the index would need to be loaded again and again. Instead of using this command as part of a pipeline, ... | umgap prot2kmer2lca index | ..., it can run in a separate (and persistent) process, reusing the same loaded index. Run umgap prot2kmer2lca -m -s umgap-socket index as a service, and when the index is loaded, change your original pipeline(s) to communicate with the socket using OpenBSD’s netcat: ... | nc -NU /path/to/umgap-socket | ....

Fields§

§length: usize

The length of the k-mers in the index

§one_on_one: bool

Map unknown sequences to 0 instead of ignoring them

§fst_file: PathBuf

An index that maps k-mers to taxon IDs

§socket: Option<PathBuf>

Instead of reading from stdin and writing to stdout, create a Unix socket to communicate with using OpenBSD’s netcat (nc -NU <socket>). This is especially useful in combination with the --in-memory flag: you only have to load the index in memory once, after which you can query it without having the loading time overhead each time.

§fst_in_memory: bool

Load index in memory instead of memory mapping the file contents. This makes querying significantly faster, but requires some initialization time.

§chunk_size: usize

Number of reads grouped into one chunk. Bigger chunks decrease the overhead caused by multithreading. Because the output order is not necessarily the same as the input order, having a chunk size which is a multiple of 12 (all 6 translations multiplied by the two paired-end reads) will keep FASTA records that originate from the same reads together.

Trait Implementations§

source§

impl Debug for ProtToKmerToLca

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl StructOpt for ProtToKmerToLca

source§

fn clap<'a, 'b>() -> App<'a, 'b>

Returns clap::App corresponding to the struct.
source§

fn from_clap(matches: &ArgMatches<'_>) -> Self

Builds the struct from clap::ArgMatches. It’s guaranteed to succeed if matches originates from an App generated by StructOpt::clap called on the same type, otherwise it must panic.
source§

fn from_args() -> Self
where Self: Sized,

Builds the struct from the command line arguments (std::env::args_os). Calls clap::Error::exit on failure, printing the error message and aborting the program.
source§

fn from_args_safe() -> Result<Self, Error>
where Self: Sized,

Builds the struct from the command line arguments (std::env::args_os). Unlike StructOpt::from_args, returns clap::Error on failure instead of aborting the program, so calling .exit is up to you.
source§

fn from_iter<I>(iter: I) -> Self
where Self: Sized, I: IntoIterator, <I as IntoIterator>::Item: Into<OsString> + Clone,

Gets the struct from any iterator such as a Vec of your making. Print the error message and quit the program in case of failure. Read more
source§

fn from_iter_safe<I>(iter: I) -> Result<Self, Error>
where Self: Sized, I: IntoIterator, <I as IntoIterator>::Item: Into<OsString> + Clone,

Gets the struct from any iterator such as a Vec of your making. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

§

impl<T> Pointable for T

§

const ALIGN: usize = _

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.