Crate kmerust

Crate kmerust 

Source
Expand description

§kmerust

A fast, parallel k-mer counter for DNA sequences in FASTA files.

§Features

  • Parallel processing using rayon and dashmap
  • Outputs canonical k-mers (lexicographically smaller of k-mer and reverse complement)
  • Supports k-mer lengths from 1 to 32
  • Handles sequences with N bases (skips invalid k-mers)
  • Compatible output format with Jellyfish

§CLI Usage

# Count 21-mers in a FASTA file
kmerust 21 sequences.fa > kmers.txt

# Count 5-mers
kmerust 5 sequences.fa > kmers.txt

§Output Format

Output is written to stdout in FASTA-like format:

>{count}
{canonical_kmer}

§Library Usage

The builder API provides a fluent interface for configuring k-mer counting:

use kmerust::builder::KmerCounter;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Simple usage
    let counts = KmerCounter::new()
        .k(21)?
        .count("sequences.fa")?;

    // With options
    let counts = KmerCounter::new()
        .k(21)?
        .min_count(5)
        .count("sequences.fa")?;

    for (kmer, count) in counts {
        println!("{kmer}: {count}");
    }
    Ok(())
}

§Direct API

For simpler use cases, the direct API is also available:

use kmerust::run::count_kmers;
use std::path::PathBuf;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let path = PathBuf::from("sequences.fa");
    let counts = count_kmers(&path, 21)?;
    for (kmer, count) in counts {
        println!("{kmer}: {count}");
    }
    Ok(())
}

§Limitations

  • K-mer length: Limited to 1-32 bases (64-bit packing uses 2 bits per base)

Modules§

builder
Builder pattern API for ergonomic k-mer counting.
cli
Command-line interface definition.
error
Error types for kmerust.
kmer
K-mer representation and manipulation.
progress
Progress tracking for k-mer counting operations.
run
K-mer counting and output.
streaming
Streaming k-mer counting for memory-efficient processing of large files.