1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
//! The `umgap bestof` command.
use io;
use crateerrors;
use cratefasta;
use crateTaxonId;
/// Selects the best read of every fixed size group
///
/// The `umgap bestof` command takes groups of taxon IDs as input and outputs for each group the
/// taxon ID with the most non-root identifications.
///
/// The input is given in FASTA format on *standard input*. Per FASTA header, there should be
/// multiple numbers (taxon IDs). Per 6 FASTA records (or whichever number you specify with `-f`),
/// the best record is selected and written to *standard output*. If the input is a series of
/// identified taxon IDs for each of the 6 translations of a read, the output will most likely come
/// from the actual coding frame.
///
/// ```sh
/// $ cat dna.fa
/// >header1
/// CGCAGAGACGGGTAGAACCTCAGTAATCCGAAAAGCCGGGATCGACCGCCCCTTGCTTGCAGCCGGGCACTACAGGACCC
/// $ umgap translate -n -a < dna.fa | umgap prot2kmer2lca 9mer.index | tee input.fa
/// >header1|1
/// 9606 9606 2759 9606 9606 9606 9606 9606 9606 9606 8287
/// >header1|2
/// 2026807 888268 186802 1598 1883
/// >header1|3
/// 1883
/// >header1|1R
/// 27342 2759 155619 1133106 38033 2
/// >header1|2R
/// >header1|3R
/// 2951
/// $ umgap bestof < input.fa
/// >header1|1
/// 9606 9606 2759 9606 9606 9606 9606 9606 9606 9606 8287
/// ```
///
/// Taxon IDs are separated by newlines in the actual output, but are separated by spaces in this
/// example.
/// Implements the bestof command.