rustkmer 0.5.2 - Docs.rs

# KmerCounter API

The `KmerCounter` class provides high-performance k-mer counting functionality for genomic sequences.

## Class Overview

```python
from pyrustkmer import KmerCounter

counter = PyCounter(21, canonical=True)
```

## Constructor

### `__init__(k, canonical=False)`

Initialize a new k-mer counter.

**Parameters:**
- `k` (int): The length of k-mers to count
- `canonical` (bool): Whether to use canonical k-mers (forward and reverse complement are considered the same)

**Example:**
```python
# Standard k-mer counting
counter = PyCounter(21)

# Canonical k-mer counting (recommended)
counter = PyCounter(21, canonical=True)
```

## Methods

### `count_file(filename)`

Count k-mers from a FASTA/FASTQ file.

**Parameters:**
- `filename` (str): Path to the input file

**Example:**
```python
counter = PyCounter(21, canonical=True)
counter.add_from_fasta("genome.fa.gz")
```

### `count_sequence(sequence)`

Count k-mers from a sequence string.

**Parameters:**
- `sequence` (str): DNA sequence string

**Example:**
```python
counter = PyCounter(7)
counter.count_sequence("ATGCGATCGATCG")
```

### `get_total_count()`

Get the total number of k-mers counted.

**Returns:**
- `int`: Total k-mer count

**Example:**
```python
total = counter.get_stats().total_kmers)
print(f"Total k-mers: {total:,}")
```

### `get_unique_count()`

Get the number of unique k-mers.

**Returns:**
- `int`: Number of unique k-mers

**Example:**
```python
unique = counter.get_unique_count()
print(f"Unique k-mers: {unique:,}")
```

### `get_top_kmers(n)`

Get the most frequent k-mers.

**Parameters:**
- `n` (int): Number of top k-mers to return

**Returns:**
- `List[Tuple[str, int]]`: List of (k-mer, count) tuples

**Example:**
```python
top_10 = counter.get_top_kmers(10)
for kmer, count in top_10:
    print(f"{kmer}: {count}")
```

### `save_to_file(filename)`

Save the k-mer database to a file.

**Parameters:**
- `filename` (str): Path to output file

**Example:**
```python
counter.save_to_file("output.rkdb")
```

### `load_from_file(filename)`

Load k-mer database from a file.

**Parameters:**
- `filename` (str): Path to input file

**Example:**
```python
counter.load_from_file("database.rkdb")
```

## Properties

### `k`

Get the k-mer size.

**Returns:**
- `int`: k-mer size

### `canonical`

Get whether canonical k-mer counting is enabled.

**Returns:**
- `bool`: Canonical counting status

### `is_empty`

Check if the counter has no k-mers.

**Returns:**
- `bool`: True if no k-mers have been counted

## Error Handling

The KmerCounter may raise the following exceptions:

- `ValueError`: Invalid k-mer size or sequence
- `FileNotFoundError`: Input file does not exist
- `IOError`: File I/O error

## Complete Example

```python
from pyrustkmer import KmerCounter

# Initialize counter
counter = PyCounter(21, canonical=True)

# Count k-mers from file
counter.add_from_fasta("genome.fa.gz")

# Get statistics
print(f"Total k-mers: {counter.get_stats().total_kmers):,}")
print(f"Unique k-mers: {counter.get_unique_count():,}")

# Get top 10 most frequent k-mers
top_kmers = counter.get_top_kmers(10)
print("\\nTop 10 k-mers:")
for kmer, count in top_kmers:
    print(f"{kmer}: {count}")

# Save database
counter.save_to_file("genome_k21.rkdb")
```

## Performance Tips

1. **Use canonical k-mers** for most applications
2. **Choose appropriate k-mer size** (k=21-31 for most genomic analysis)
3. **Enable sorting** for better query performance
4. **Use memory mapping** for very large datasets
5. **Process in batches** for large input files