Skip to main content

Filter

Struct Filter 

Source
pub struct Filter { /* private fields */ }
Expand description

Filter reads from FASTQ files based on kraken2 classification results.

Extracts reads classified to one or more taxon IDs from FASTQ files, using the kraken2 report (taxonomy tree) and per-read classification output. Supports both single-end and paired-end reads, and writes bgzf-compressed output.

§Required inputs

The command needs three pieces of data that must all come from the same kraken2 run:

  • --kraken-report (-r): The kraken2 report file containing the taxonomy tree and per-taxon read counts. This is used to resolve taxon IDs, expand descendants, and estimate the expected number of matching reads.
  • --kraken-output (-k): The per-read classification output from kraken2 (generated with --output). Each line maps a read name to a taxon ID.
  • --input (-i): One FASTQ file for single-end data, or two for paired-end. Gzip and bgzf compressed inputs are detected and handled automatically.

The kraken output and FASTQ file(s) must contain the same reads in the same order. The command verifies read name agreement and will error if the files are mismatched or have different numbers of records.

§Taxon selection

At least one of --taxon-ids or --include-unclassified must be specified.

  • --taxon-ids (-t): One or more NCBI taxon IDs to extract. By default, only reads classified directly to these exact taxon IDs are included.
  • --include-descendants (-d): Expand each taxon ID to include all of its descendants in the taxonomy tree. For example, specifying a genus-level taxon ID with -d will also extract reads classified to any species or strain within that genus.
  • --include-unclassified (-u): Include reads that kraken2 could not classify (taxon ID 0). Can be combined with --taxon-ids to extract both classified and unclassified reads in a single pass.

§Output

  • --output (-o): Output FASTQ file path(s). Must provide the same number of output files as input files (one for single-end, two for paired-end). Outputs are always bgzf-compressed regardless of file extension.
  • --threads: Number of threads used for bgzf compression (default: 4).
  • --compression-level: Bgzf compression level from 0 (fastest) to 9 (smallest), default 5.

§Examples

Extract all reads classified as E. coli (taxon 562):

k2tools filter -r report.txt -k output.txt -i reads.fq.gz -o ecoli.fq.gz -t 562

Extract all Enterobacteriaceae (taxon 543) including every species and strain beneath it in the taxonomy:

k2tools filter -r report.txt -k output.txt \
    -i reads.fq.gz -o entero.fq.gz -t 543 -d

Extract unclassified reads from a paired-end run:

k2tools filter -r report.txt -k output.txt \
    -i r1.fq.gz r2.fq.gz -o unclass_r1.fq.gz unclass_r2.fq.gz -u

Extract human reads plus unclassified in a single pass:

k2tools filter -r report.txt -k output.txt \
    -i reads.fq.gz -o host_and_unclass.fq.gz -t 9606 -d -u

Trait Implementations§

Source§

impl Args for Filter

Source§

fn group_id() -> Option<Id>

Report the ArgGroup::id for this set of arguments
Source§

fn augment_args<'b>(__clap_app: Command) -> Command

Append to Command so it can instantiate Self via FromArgMatches::from_arg_matches_mut Read more
Source§

fn augment_args_for_update<'b>(__clap_app: Command) -> Command

Append to Command so it can instantiate self via FromArgMatches::update_from_arg_matches_mut Read more
Source§

impl Command for Filter

Source§

fn execute(&self) -> Result<()>

Execute the command. Read more
Source§

impl FromArgMatches for Filter

Source§

fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>

Instantiate Self from ArgMatches, parsing the arguments as needed. Read more
Source§

fn from_arg_matches_mut( __clap_arg_matches: &mut ArgMatches, ) -> Result<Self, Error>

Instantiate Self from ArgMatches, parsing the arguments as needed. Read more
Source§

fn update_from_arg_matches( &mut self, __clap_arg_matches: &ArgMatches, ) -> Result<(), Error>

Assign values from ArgMatches to self.
Source§

fn update_from_arg_matches_mut( &mut self, __clap_arg_matches: &mut ArgMatches, ) -> Result<(), Error>

Assign values from ArgMatches to self.

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.