1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
//! # kmerust
//!
//! A fast, parallel [k-mer](https://en.wikipedia.org/wiki/K-mer) counter for DNA sequences in FASTA and FASTQ files.
//!
//! ## Features
//!
//! - Parallel processing using [rayon](https://docs.rs/rayon) and [dashmap](https://docs.rs/dashmap)
//! - Outputs canonical k-mers (lexicographically smaller of k-mer and reverse complement)
//! - Supports k-mer lengths from 1 to 32
//! - Handles sequences with N bases (skips invalid k-mers)
//! - Compatible output format with [Jellyfish](https://github.com/gmarcais/Jellyfish)
//!
//! ## CLI Usage
//!
//! ```bash
//! # Count 21-mers in a FASTA file
//! kmerust 21 sequences.fa > kmers.txt
//!
//! # Count 5-mers
//! kmerust 5 sequences.fa > kmers.txt
//! ```
//!
//! ## Output Format
//!
//! Output is written to stdout in FASTA-like format:
//! ```text
//! >{count}
//! {canonical_kmer}
//! ```
//!
//! ## Library Usage
//!
//! ### Builder API (Recommended)
//!
//! The builder API provides a fluent interface for configuring k-mer counting:
//!
//! ```rust,no_run
//! use kmerust::builder::KmerCounter;
//!
//! fn main() -> Result<(), Box<dyn std::error::Error>> {
//! // Simple usage
//! let counts = KmerCounter::new()
//! .k(21)?
//! .count("sequences.fa")?;
//!
//! // With options
//! let counts = KmerCounter::new()
//! .k(21)?
//! .min_count(5)
//! .count("sequences.fa")?;
//!
//! for (kmer, count) in counts {
//! println!("{kmer}: {count}");
//! }
//! Ok(())
//! }
//! ```
//!
//! ### Direct API
//!
//! For simpler use cases, the direct API is also available:
//!
//! ```rust,no_run
//! use kmerust::run::count_kmers;
//! use std::path::PathBuf;
//!
//! fn main() -> Result<(), Box<dyn std::error::Error>> {
//! let path = PathBuf::from("sequences.fa");
//! let counts = count_kmers(&path, 21)?;
//! for (kmer, count) in counts {
//! println!("{kmer}: {count}");
//! }
//! Ok(())
//! }
//! ```
//!
//! ## Limitations
//!
//! - **K-mer length:** Limited to 1-32 bases (64-bit packing uses 2 bits per base)
pub