rustyomestats 0.2.0

Fast genome statistics: length, GC, N/L, 6-frame and FragGeneScan codon density, plus Castro U50 assembly metrics.
Documentation

๐Ÿฆ€ RustyOmeStats

Blazing-fast assembly statistics for genomes, metagenomes, transcriptomes, and beyond.

Rust Crates.io License Build Platform Polars Bioinformatics

โšก Modern Assembly Metrics โ€ข ๐Ÿ“Š Interactive Reports โ€ข ๐Ÿงฌ Codon Analytics โ€ข ๐Ÿš€ Parallelized Rust


๐Ÿ”ฌ What is RustyOmeStats?

RustyOmeStats is a high-performance bioinformatics toolkit written in Rust for calculating assembly statistics from:

  • ๐Ÿงฌ Genomes
  • ๐ŸŒ Metagenomes
  • ๐Ÿงซ MAGs
  • ๐Ÿงช Transcriptomes
  • ๐Ÿง  Metatranscriptomes
  • ๐Ÿ“– Reference-guided assemblies

Designed for speed, reproducibility, and publication-ready outputs, RustyOmeStats combines modern Rust parallelism with rich plotting/report generation.


โœจ Features

๐Ÿงฌ Genome / Metagenome Analytics

  • GC%
  • Sequence length statistics
  • N/L metrics (N25โ€“N90)
  • 6-frame codon density
  • FragGeneScan predicted codon usage
  • Parallel FASTA processing
  • Folder-wide assembly analysis
  • Polars-backed tabular outputs

๐Ÿ“Š Modern Assembly Metrics

  • N50 / L50
  • NG50 / LG50
  • U50 / UL50
  • UG50 / ULG50
  • Gap interval detection
  • Overlap interval detection
  • Coverage visualization
  • Reference-aware assembly evaluation

๐Ÿ–ผ๏ธ Example Outputs

GC vs Length Codon Heatmap Coverage
๐Ÿ“ˆ Publication-ready ๐Ÿ”ฅ Frame-aware ๐Ÿงฌ Reference-guided
โœ” Interactive HTML reports
โœ” Self-contained PNG figures
โœ” Polars DataFrames
โœ” Parallelized Rust backend
โœ” Reproducible workflows

โšก Why RustyOmeStats?

Feature RustyOmeStats
๐Ÿš€ Multi-threaded Rust core โœ…
๐Ÿงฌ Codon density analysis โœ…
๐Ÿ“Š Automated visualization โœ…
๐ŸŒ Metagenome support โœ…
๐Ÿง  U50/UG50 implementation โœ…
๐Ÿ Python plotting layer โœ…
๐Ÿ“ Batch assembly processing โœ…
โš™๏ธ Polars DataFrames โœ…

๐Ÿงฑ Architecture

flowchart LR
    A[FASTA / BED Input] --> B[Rust Core]
    B --> C[Polars DataFrames]
    C --> D[CSV Outputs]
    D --> E[Python Plotter]
    E --> F[PNG Figures]
    E --> G[Interactive HTML Report]

๐Ÿฆ€ Tech Stack

Component Technology
Core engine Rust
Parallelism rayon
DataFrames polars
FASTA/BED parsing rust-bio
CLI clap
Error handling anyhow
Plotting seaborn + matplotlib
ORF prediction FragGeneScanRs

๐Ÿš€ Installation

1๏ธโƒฃ Install Rust

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable

2๏ธโƒฃ Install RustyOmeStats

From crates.io

cargo install rustyomestats

From source

git clone https://github.com/raw937/rustyomestats
cd rustyomestats

cargo install --path .

3๏ธโƒฃ Optional: FragGeneScanRs

Required only for predicted codon density.

cargo install fraggenescanrs

or

conda install -c bioconda fraggenescanrs

4๏ธโƒฃ Install Plotting Dependencies

pip install polars seaborn matplotlib

โšก Quick Start

๐Ÿงฌ Analyze a Genome

rustyomestats genome \
    -f my_genome.fna \
    -o out/ \
    -t 8

Generate plots + HTML report:

python scripts/plot_stats.py -d out/

๐Ÿ“ฆ Output Files

RustyOmeStats generates rich tabular outputs, publication-ready figures, and a fully self-contained interactive HTML report.

summary_stats.csv
โ”œโ”€โ”€ Global assembly statistics
โ”œโ”€โ”€ Total sequences / total bp
โ”œโ”€โ”€ GC%
โ””โ”€โ”€ N25โ€“N90 and L25โ€“L90 assembly metrics

per_sequence.csv
โ”œโ”€โ”€ Per-contig / per-sequence statistics
โ”œโ”€โ”€ Sequence ID
โ”œโ”€โ”€ Length distribution
โ””โ”€โ”€ GC composition for every record

length_intervals.csv
โ”œโ”€โ”€ Length-bin frequency table
โ”œโ”€โ”€ Histogram-ready interval counts
โ””โ”€โ”€ Used for contig size distribution plots

codon_absolute.csv
โ”œโ”€โ”€ Raw 6-frame codon statistics
โ”œโ”€โ”€ Codon counts and densities
โ”œโ”€โ”€ Frame-specific measurements
โ””โ”€โ”€ Long-format analytics table

codon_absolute_aggregate.csv
โ”œโ”€โ”€ Global codon usage profile
โ”œโ”€โ”€ Aggregated across all sequences
โ””โ”€โ”€ 64-codon genome-wide abundance table

codon_predicted.csv
โ”œโ”€โ”€ FragGeneScan-predicted ORF codons
โ”œโ”€โ”€ Per-gene codon frequencies
โ””โ”€โ”€ Coding-region codon density statistics

codon_predicted_aggregate.csv
โ”œโ”€โ”€ Aggregate predicted ORF codon usage
โ”œโ”€โ”€ Genome-wide predicted CDS codon profile
โ””โ”€โ”€ Useful for translational bias analyses

codon_comparison.csv
โ”œโ”€โ”€ Absolute vs predicted codon usage
โ”œโ”€โ”€ Enrichment/depletion statistics
โ”œโ”€โ”€ Translational bias comparisons
โ””โ”€โ”€ Predicted-over-absolute enrichment metrics

fgs_predicted.{ffn,faa,out,gff}
โ”œโ”€โ”€ Raw FragGeneScanRs outputs
โ”œโ”€โ”€ Predicted nucleotide ORFs (.ffn)
โ”œโ”€โ”€ Predicted proteins (.faa)
โ”œโ”€โ”€ Gene annotations (.gff)
โ””โ”€โ”€ Raw model output/log files

plot_length_histogram.png
โ”œโ”€โ”€ Contig/scaffold size distribution
โ””โ”€โ”€ Publication-ready histogram visualization

plot_gc_distribution.png
โ”œโ”€โ”€ GC variability across sequences
โ””โ”€โ”€ Detects compositional heterogeneity

plot_gc_vs_length.png
โ”œโ”€โ”€ GC% versus sequence length
โ”œโ”€โ”€ Detects assembly structure patterns
โ””โ”€โ”€ Useful for MAG/metagenome exploration

plot_codon_usage_bar.png
โ”œโ”€โ”€ Genome-wide codon abundance plots
โ”œโ”€โ”€ Absolute vs predicted codon usage
โ””โ”€โ”€ Translational preference visualization

plot_codon_heatmap_by_frame.png
โ”œโ”€โ”€ 6-frame codon density heatmap
โ”œโ”€โ”€ Frame-aware codon visualization
โ””โ”€โ”€ High-dimensional codon pattern analysis

plot_codon_enrichment.png
โ”œโ”€โ”€ Codon enrichment/depletion analysis
โ”œโ”€โ”€ Predicted vs absolute codon shifts
โ””โ”€โ”€ Translational bias visualization

report.html
โ”œโ”€โ”€ Fully self-contained interactive report
โ”œโ”€โ”€ All plots embedded inline
โ”œโ”€โ”€ Metric summaries + tables
โ”œโ”€โ”€ Portable single-file visualization dashboard
โ””โ”€โ”€ Shareable publication-ready report

๐ŸŒ Analyze Multiple Assemblies

rustyomestats genome \
    -f assemblies/ \
    -o out/ \
    -t 32

๐Ÿง  U50 / UG50 Assembly Metrics

RustyOmeStats implements the modern metrics proposed in:

Castro et al. 2017. Castro CJ, Ng TFF. U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs. J Comput Biol. 2017 24(11):1071-1080.

Including:

  • U50
  • UL50
  • UG50
  • ULG50
  • Gap-aware assembly evaluation
  • Overlap-aware assembly evaluation

๐Ÿ”ฌ Example U50 Workflow

rustyomestats u50 \
    --reference ref.fa \
    --bed contigs.sorted.bed \
    --outdir out/

Generate figures:

python scripts/plot_stats.py -d out/

๐Ÿ“Š Generated Visualizations

Plot Description
๐Ÿ“ˆ GC Distribution GC variability across sequences
๐Ÿ”ฅ Codon Heatmap Frame-specific codon usage
๐Ÿ“‰ Length Histogram Assembly contig distributions
๐Ÿงฌ Coverage Plot Reference coverage structure
๐Ÿ“Š U50 Summary Modern assembly metric overview

๐Ÿ Interactive HTML Reports

RustyOmeStats automatically generates:

โœ… Self-contained HTML reports โœ… Inline PNG visualizations โœ… Portable single-file reports โœ… Publication-ready figures

Open directly in your browser:

firefox report.html

๐Ÿ“š Library Usage

RustyOmeStats can also be embedded as a Rust crate.

use rustyomestats::{io_utils, stats, u50};
use std::path::Path;

// genome stats
let files  = io_utils::collect_fasta_files(Path::new("genome.fna"))?;
let recs   = io_utils::load_all_records(&files)?;
let basic  = stats::compute_basic(&recs);

println!("{} sequences", basic.num_seq);

// U50 stats
let res = u50::compute_u50(
    Path::new("ref.fa"),
    Path::new("contigs.bed"),
    Path::new("out")
)?;

println!("UG50 = {}", res.ug50);

๐Ÿงช Testing

cargo test

Covers:

  • N50/U50 correctness
  • Greedy masking
  • BED deduplication
  • Reverse complements
  • 6-frame codon indexing
  • Hand-validated toy assemblies

๐Ÿ“„ License

Creative Commons Attribution-NonCommercial (CC BY-NC 4.0)

See the LICENSE file for details.


๐Ÿ“– Citation

If you use RustyOmeStats in published work, please cite:

White III RA et al.
RustyOmeStats: High-performance genome and metagenome assembly statistics in Rust.

๐Ÿค Contributing

We welcome:

  • ๐Ÿงฌ New assembly metrics
  • โšก Performance optimizations
  • ๐Ÿ“Š Visualization improvements
  • ๐Ÿ Python plotting extensions
  • ๐Ÿฆ€ Rust ecosystem integrations

Pull requests and issues are encouraged.


๐Ÿ“ž Support


๐Ÿฆ€ RustyOmeStats

Fast. Parallel. Modern Bioinformatics .

Built with โค๏ธ in Rust.