pub struct Filter { /* private fields */ }Expand description
Filter reads from FASTQ files based on kraken2 classification results.
Extracts reads classified to one or more taxon IDs from FASTQ files, using the kraken2 report (taxonomy tree) and per-read classification output. Supports both single-end and paired-end reads, and writes bgzf-compressed output.
§Required inputs
The command needs three pieces of data that must all come from the same kraken2 run:
--kraken-report(-r): The kraken2 report file containing the taxonomy tree and per-taxon read counts. This is used to resolve taxon IDs, expand descendants, and estimate the expected number of matching reads.--kraken-output(-k): The per-read classification output from kraken2 (generated with--output). Each line maps a read name to a taxon ID.--input(-i): One FASTQ file for single-end data, or two for paired-end. Gzip and bgzf compressed inputs are detected and handled automatically.
The kraken output and FASTQ file(s) must contain the same reads in the same order. The command verifies read name agreement and will error if the files are mismatched or have different numbers of records.
§Taxon selection
At least one of --taxon-ids or --include-unclassified must be specified.
--taxon-ids(-t): One or more NCBI taxon IDs to extract. By default, only reads classified directly to these exact taxon IDs are included.--include-descendants(-d): Expand each taxon ID to include all of its descendants in the taxonomy tree. For example, specifying a genus-level taxon ID with-dwill also extract reads classified to any species or strain within that genus.--include-unclassified(-u): Include reads that kraken2 could not classify (taxon ID 0). Can be combined with--taxon-idsto extract both classified and unclassified reads in a single pass.
§Output
--output(-o): Output FASTQ file path(s). Must provide the same number of output files as input files (one for single-end, two for paired-end). Outputs are always bgzf-compressed regardless of file extension.--threads: Number of threads used for bgzf compression (default: 4).--compression-level: Bgzf compression level from 0 (fastest) to 9 (smallest), default 5.
§Examples
Extract all reads classified as E. coli (taxon 562):
k2tools filter -r report.txt -k output.txt -i reads.fq.gz -o ecoli.fq.gz -t 562Extract all Enterobacteriaceae (taxon 543) including every species and strain beneath it in the taxonomy:
k2tools filter -r report.txt -k output.txt \
-i reads.fq.gz -o entero.fq.gz -t 543 -dExtract unclassified reads from a paired-end run:
k2tools filter -r report.txt -k output.txt \
-i r1.fq.gz r2.fq.gz -o unclass_r1.fq.gz unclass_r2.fq.gz -uExtract human reads plus unclassified in a single pass:
k2tools filter -r report.txt -k output.txt \
-i reads.fq.gz -o host_and_unclass.fq.gz -t 9606 -d -uTrait Implementations§
Source§impl Args for Filter
impl Args for Filter
Source§fn augment_args<'b>(__clap_app: Command) -> Command
fn augment_args<'b>(__clap_app: Command) -> Command
Source§fn augment_args_for_update<'b>(__clap_app: Command) -> Command
fn augment_args_for_update<'b>(__clap_app: Command) -> Command
Command so it can instantiate self via
FromArgMatches::update_from_arg_matches_mut Read moreSource§impl FromArgMatches for Filter
impl FromArgMatches for Filter
Source§fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
Source§fn from_arg_matches_mut(
__clap_arg_matches: &mut ArgMatches,
) -> Result<Self, Error>
fn from_arg_matches_mut( __clap_arg_matches: &mut ArgMatches, ) -> Result<Self, Error>
Source§fn update_from_arg_matches(
&mut self,
__clap_arg_matches: &ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches( &mut self, __clap_arg_matches: &ArgMatches, ) -> Result<(), Error>
ArgMatches to self.Source§fn update_from_arg_matches_mut(
&mut self,
__clap_arg_matches: &mut ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches_mut( &mut self, __clap_arg_matches: &mut ArgMatches, ) -> Result<(), Error>
ArgMatches to self.