nanocov 0.1.0

Rust Coverage Calculator and QC Plot Generation Tool
Documentation
# NanoCov

[![Version](https://img.shields.io/badge/version-0.1.0-blue.svg)](https://github.com/geonic/nanocov)
[![Status: Experimental](https://img.shields.io/badge/status-experimental-orange.svg)](https://github.com/geonic/nanocov)
[![License: MIT](https://img.shields.io/badge/license-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Rust Edition](https://img.shields.io/badge/rust-2024-orange.svg)](https://doc.rust-lang.org/edition-guide/rust-2024/index.html)

NanoCov is a high-performance tool for calculating and visualizing genomic coverage from BAM files. It's designed to efficiently process large datasets and generate publication-quality plots of per-base coverage across chromosomes.

## Coverage Visualization

NanoCov provides high-quality coverage visualization for individual chromosomes with features including:

- Per-base coverage plots with adaptive binning
- Support for logarithmic and linear scaling
- Multiple color themes (Catppuccin, Nord, Gruvbox)
- Publication-ready PNG and SVG output formats
- Detect chromosomal aneuploidy or large structural variations

The plot automatically sorts chromosomes in natural order (1-22, X, Y, MT) and color-codes coverage levels using a gradient color scheme that highlights variations.

### Per-Chromosome Detailed View

For in-depth analysis, NanoCov generates detailed per-chromosome plots with comprehensive statistics:

![Single Chromosome Coverage Plot](test-out/coverage.chr7.png)

These detailed plots include:
- Per-base coverage profile with color gradient indicating coverage depth
- Read statistics panel showing N50, quality metrics, and length distribution
- Coverage statistics including mean, median, and standard deviation
- Automatic binning for optimal visualization of regions of any size

## Features

- **Fast**: Parallel processing of BAM files for high throughput
- **Flexible**: Analyze whole genomes or specific regions defined by BED files
- **Informative**: Detailed statistics including mean, median, N50, and more
- **Visual**: Publication-quality plots with multiple color themes
- **Scalable**: Handles large genomes with adaptive binning
- **Versatile**: Supports both per-chromosome and genome-wide visualizations
- **Insightful**: Extracts meaningful statistics including N50, mean/median coverage

## Installation

### From Source

```bash
cargo install --git https://github.com/geonic/nanocov
```

### Requirements

- Rust 1.85 or newer (edition 2024)
- Samtools (for BAM indexing)

## Quick Start

```bash
# Index your BAM file (if not already indexed)
samtools index your_data.bam

# Run basic coverage analysis
nanocov -i your_data.bam --output-dir results

# Analyze specific regions from a BED file
nanocov -i your_data.bam -b regions.bed --output-dir results

# Generate per-chromosome plots (disabled by default)
nanocov -i your_data.bam --per-chromosome-plot --output-dir results

# Use linear scale for plots (log scale is default)
nanocov -i your_data.bam --per-chromosome-plot --linear --output-dir results

# Generate statistics output (always enabled)
nanocov -i your_data.bam --output-dir results
```

## Usage

```
USAGE:
    nanocov (--input <FILE> | --batch-tsv <FILE>) --output-dir <DIR> [OPTIONS]

OPTIONS:
    -i, --input <FILE>         Input BAM file
    --batch-tsv <FILE>         TSV with BAM path, BED path, and optional prefix columns
    --batch-output <FILE>      Output file for aggregated batch statistics [default: <output-dir>/batch.statistics.tsv]
    -b, --bed <FILE>           BED file with regions to include (0-based, end-exclusive)
    -o, --output-dir <DIR>     Output directory for generated files
    --prefix <STRING>          Output file prefix [default: input file stem]
    -t, --threads <NUM>        Worker threads (alias for --async-tasks)
    --async-tasks <NUM>        Number of async tasks [default: half of available cores, minimum 2]
    --io-buffer-size <KB>      Async I/O buffer size in KB [default: 64]
    -c, --chunk-size <NUM>     Chunk size for region processing in base pairs [default: 100000]
    --adaptive-chunks          Use adaptive chunk sizing based on region length
    --mmap                     Enable memory-mapped file I/O
    --theme <THEME>            Color theme [latte, frappe, nord, gruvbox]
    --svg                      Use SVG output format for plots
    --plot-bin-size <NUM>      Fixed plot bin size in bp (overrides adaptive plot binning)
    --show-zeros               Show regions with zero coverage in plots
    --linear                   Use linear scale for coverage plots
    --overview-plot            Generate the overview plot
    --per-chromosome-plot      Generate per-chromosome plots
    --non-canonical            Include non-canonical chromosomes in coverage statistics (default: canonical + MT/EBV)
    --invert-regions           Invert BED regions (analyze the complement within each chromosome)
    --sequential-plots         Generate plots sequentially (concurrent is default)
    -h, --help                 Print help information
```

# Outputs

For an input file `example.bam` and output option `-o results`, NanoCov always produces:

- `results/example.tsv`: Tab-separated coverage data for each position
- `results/example.statistics`: Statistics output file

Optional plot outputs:
- `results/example.<chrom>.png` (or `.svg`) when `--per-chromosome-plot` is enabled
- `results/example.overview.png` (or `.svg`) when `--overview-plot` is enabled

## Color Themes

NanoCov supports multiple color themes to create publication-quality figures that match your preferences:

### Available Themes

Below is a representation of how the different themes appear in the plots:

| Theme | Example |
|-------|---------|
| **Catppuccin Latte (Default)** | ![Latte Theme]test-out/coverage.test.latte.png |
| **Catppuccin Frappe (Dark)** | ![Frappe Theme]test-out/coverage.test.frappe.png |
| **Nord** | ![Nord Theme]test-out/coverage.test.nord.png |
| **Gruvbox** | ![Gruvbox Theme]test-out/coverage.test.gruvbox.png |

To use a specific theme:
```bash
nanocov -i your_data.bam --per-chromosome-plot --theme frappe --output-dir results
```

The example images above show the visual differences between the themes. Choose the one that best fits your needs.

### Generating Theme Examples

You can easily generate examples with all four themes using these commands:

```bash
# Generate examples with sample data for all themes
nanocov -i your_data.bam --per-chromosome-plot --theme latte --output-dir results
nanocov -i your_data.bam --per-chromosome-plot --theme frappe --output-dir results
nanocov -i your_data.bam --per-chromosome-plot --theme nord --output-dir results
nanocov -i your_data.bam --per-chromosome-plot --theme gruvbox --output-dir results
```

Each theme creates distinct visualizations that may be better suited for different publication contexts or personal preferences.

## Advanced Usage

### Analyzing Specific Regions

Use a BED file to limit coverage analysis to regions of interest (0-based, end-exclusive):

```bash
nanocov -i sample.bam -b targets.bed --output-dir results
```

### Generating Statistics Output

Generate detailed statistics for quality control:

```bash
nanocov -i sample.bam --output-dir results
```

This will create a file called `sample.statistics` with statistics including:
- Number of alignments/reads
- Yield statistics (total and >25kb)
- N50 and N75 values
- Length statistics (mean, median)
- Mean coverage
- File metadata (path, creation time)

Statistics output is always written to `<output-dir>/<prefix>.statistics`.

### Logarithmic Scale (Default)

Logarithmic scale is the default for plots and works well for highly variable coverage:

```bash
nanocov -i sample.bam --per-chromosome-plot --output-dir results
```

This is particularly useful when analyzing regions with very different coverage depths (e.g., comparing high-coverage exons to lower-coverage introns).

### Custom Themes

Select a theme that works best for your publication or presentation:

```bash
nanocov -i sample.bam --per-chromosome-plot --theme nord --output-dir results
```

## Batch Mode (TSV)

Run multiple BAM/BED pairs in one command using `--batch-tsv`. The TSV has three columns: BAM path, BED path, and an optional prefix to use for that sample's outputs (use `-`/`none`/`NA` to omit a BED for a sample).

Example `samples.tsv`:
```
/data/sample1.bam	/data/targets1.bed	sample1
/data/sample2.bam	-	sample2
```

Command:
```bash
nanocov --batch-tsv samples.tsv --output-dir results
```

NanoCov will process each row, using the row-specific BED (or none if omitted), and write a combined table to `results/batch.statistics.tsv` with `prefix` and `bed_path` columns prepended. Individual outputs use the row prefix if provided; otherwise a unique prefix is derived from the BAM name (with numeric suffixes to avoid collisions).

## Examples

### Basic Coverage Analysis

```bash
nanocov -i your_data.bam --output-dir results
```

This generates:
- A tabular TSV file with per-base coverage data for downstream analysis
- A statistics summary file (`.statistics`)

### Targeted Analysis with Enhanced Visualization

```bash
nanocov -i your_data.bam -b regions.bed --per-chromosome-plot --theme frappe --show-zeros --output-dir results
```

This analyzes only the regions in your BED file. Add `--per-chromosome-plot` (and/or `--overview-plot`) to generate themed plots and include zero-coverage regions.

## License

MIT

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.