# ๐ฆ RustyOmeStats
### *Blazing-fast assembly statistics for genomes, metagenomes, transcriptomes, and beyond.*
<div align="center">







### โก Modern Assembly Metrics โข ๐ Interactive Reports โข ๐งฌ Codon Analytics โข ๐ Parallelized Rust
</div>
---
# ๐ฌ What is RustyOmeStats?
**RustyOmeStats** is a high-performance bioinformatics toolkit written in **Rust** for calculating assembly statistics from:
* ๐งฌ Genomes
* ๐ Metagenomes
* ๐งซ MAGs
* ๐งช Transcriptomes
* ๐ง Metatranscriptomes
* ๐ Reference-guided assemblies
Designed for **speed**, **reproducibility**, and **publication-ready outputs**, RustyOmeStats combines modern Rust parallelism with rich plotting/report generation.
---
# โจ Features
<table>
<tr>
<td width="50%">
## ๐งฌ Genome / Metagenome Analytics
* GC%
* Sequence length statistics
* N/L metrics (N25โN90)
* 6-frame codon density
* FragGeneScan predicted codon usage
* Parallel FASTA processing
* Folder-wide assembly analysis
* Polars-backed tabular outputs
</td>
<td width="50%">
## ๐ Modern Assembly Metrics
* N50 / L50
* NG50 / LG50
* U50 / UL50
* UG50 / ULG50
* Gap interval detection
* Overlap interval detection
* Coverage visualization
* Reference-aware assembly evaluation
</td>
</tr>
</table>
---
# ๐ผ๏ธ Example Outputs
<div align="center">
| ๐ Publication-ready | ๐ฅ Frame-aware | ๐งฌ Reference-guided |
</div>
```text
โ Interactive HTML reports
โ Self-contained PNG figures
โ Polars DataFrames
โ Parallelized Rust backend
โ Reproducible workflows
```
---
# โก Why RustyOmeStats?
| ๐ Multi-threaded Rust core | โ
|
| ๐งฌ Codon density analysis | โ
|
| ๐ Automated visualization | โ
|
| ๐ Metagenome support | โ
|
| ๐ง U50/UG50 implementation | โ
|
| ๐ Python plotting layer | โ
|
| ๐ Batch assembly processing | โ
|
| โ๏ธ Polars DataFrames | โ
|
---
# ๐งฑ Architecture
```mermaid
flowchart LR
A[FASTA / BED Input] --> B[Rust Core]
B --> C[Polars DataFrames]
C --> D[CSV Outputs]
D --> E[Python Plotter]
E --> F[PNG Figures]
E --> G[Interactive HTML Report]
```
---
# ๐ฆ Tech Stack
| Core engine | Rust |
| Parallelism | rayon |
| DataFrames | polars |
| FASTA/BED parsing | rust-bio |
| CLI | clap |
| Error handling | anyhow |
| Plotting | seaborn + matplotlib |
| ORF prediction | FragGeneScanRs |
---
# ๐ Installation
## 1๏ธโฃ Install Rust
```bash
```
---
## 2๏ธโฃ Install RustyOmeStats
### From crates.io
```bash
cargo install rustyomestats
```
### From source
```bash
git clone https://github.com/raw937/rustyomestats
cd rustyomestats
cargo install --path .
```
---
## 3๏ธโฃ Optional: FragGeneScanRs
Required only for predicted codon density.
```bash
cargo install fraggenescanrs
```
or
```bash
conda install -c bioconda fraggenescanrs
```
---
## 4๏ธโฃ Install Plotting Dependencies
```bash
pip install polars seaborn matplotlib
```
---
# โก Quick Start
## ๐งฌ Analyze a Genome
```bash
rustyomestats genome \
-f my_genome.fna \
-o out/ \
-t 8
```
Generate plots + HTML report:
```bash
python scripts/plot_stats.py -d out/
```
---
# ๐ฆ Output Files
RustyOmeStats generates rich tabular outputs, publication-ready figures, and a fully self-contained interactive HTML report.
```text
summary_stats.csv
โโโ Global assembly statistics
โโโ Total sequences / total bp
โโโ GC%
โโโ N25โN90 and L25โL90 assembly metrics
per_sequence.csv
โโโ Per-contig / per-sequence statistics
โโโ Sequence ID
โโโ Length distribution
โโโ GC composition for every record
length_intervals.csv
โโโ Length-bin frequency table
โโโ Histogram-ready interval counts
โโโ Used for contig size distribution plots
codon_absolute.csv
โโโ Raw 6-frame codon statistics
โโโ Codon counts and densities
โโโ Frame-specific measurements
โโโ Long-format analytics table
codon_absolute_aggregate.csv
โโโ Global codon usage profile
โโโ Aggregated across all sequences
โโโ 64-codon genome-wide abundance table
codon_predicted.csv
โโโ FragGeneScan-predicted ORF codons
โโโ Per-gene codon frequencies
โโโ Coding-region codon density statistics
codon_predicted_aggregate.csv
โโโ Aggregate predicted ORF codon usage
โโโ Genome-wide predicted CDS codon profile
โโโ Useful for translational bias analyses
codon_comparison.csv
โโโ Absolute vs predicted codon usage
โโโ Enrichment/depletion statistics
โโโ Translational bias comparisons
โโโ Predicted-over-absolute enrichment metrics
fgs_predicted.{ffn,faa,out,gff}
โโโ Raw FragGeneScanRs outputs
โโโ Predicted nucleotide ORFs (.ffn)
โโโ Predicted proteins (.faa)
โโโ Gene annotations (.gff)
โโโ Raw model output/log files
plot_length_histogram.png
โโโ Contig/scaffold size distribution
โโโ Publication-ready histogram visualization
plot_gc_distribution.png
โโโ GC variability across sequences
โโโ Detects compositional heterogeneity
plot_gc_vs_length.png
โโโ GC% versus sequence length
โโโ Detects assembly structure patterns
โโโ Useful for MAG/metagenome exploration
plot_codon_usage_bar.png
โโโ Genome-wide codon abundance plots
โโโ Absolute vs predicted codon usage
โโโ Translational preference visualization
plot_codon_heatmap_by_frame.png
โโโ 6-frame codon density heatmap
โโโ Frame-aware codon visualization
โโโ High-dimensional codon pattern analysis
plot_codon_enrichment.png
โโโ Codon enrichment/depletion analysis
โโโ Predicted vs absolute codon shifts
โโโ Translational bias visualization
report.html
โโโ Fully self-contained interactive report
โโโ All plots embedded inline
โโโ Metric summaries + tables
โโโ Portable single-file visualization dashboard
โโโ Shareable publication-ready report
```
---
# ๐ Analyze Multiple Assemblies
```bash
rustyomestats genome \
-f assemblies/ \
-o out/ \
-t 32
```
---
# ๐ง U50 / UG50 Assembly Metrics
RustyOmeStats implements the modern metrics proposed in:
> [Castro et al. 2017](https://journals.sagepub.com/doi/abs/10.1089/cmb.2017.0013). Castro CJ, Ng TFF. U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs. J Comput Biol. 2017 24(11):1071-1080.
Including:
* U50
* UL50
* UG50
* ULG50
* Gap-aware assembly evaluation
* Overlap-aware assembly evaluation
---
# ๐ฌ Example U50 Workflow
```bash
rustyomestats u50 \
--reference ref.fa \
--bed contigs.sorted.bed \
--outdir out/
```
Generate figures:
```bash
python scripts/plot_stats.py -d out/
```
---
# ๐ Generated Visualizations
<div align="center">
| ๐ GC Distribution | GC variability across sequences |
| ๐ฅ Codon Heatmap | Frame-specific codon usage |
| ๐ Length Histogram | Assembly contig distributions |
| ๐งฌ Coverage Plot | Reference coverage structure |
| ๐ U50 Summary | Modern assembly metric overview |
</div>
---
# ๐ Interactive HTML Reports
RustyOmeStats automatically generates:
โ
Self-contained HTML reports
โ
Inline PNG visualizations
โ
Portable single-file reports
โ
Publication-ready figures
Open directly in your browser:
```bash
firefox report.html
```
---
# ๐ Library Usage
RustyOmeStats can also be embedded as a Rust crate.
```rust
use rustyomestats::{io_utils, stats, u50};
use std::path::Path;
// genome stats
let files = io_utils::collect_fasta_files(Path::new("genome.fna"))?;
let recs = io_utils::load_all_records(&files)?;
let basic = stats::compute_basic(&recs);
println!("{} sequences", basic.num_seq);
// U50 stats
let res = u50::compute_u50(
Path::new("ref.fa"),
Path::new("contigs.bed"),
Path::new("out")
)?;
println!("UG50 = {}", res.ug50);
```
---
# ๐งช Testing
```bash
cargo test
```
Covers:
* N50/U50 correctness
* Greedy masking
* BED deduplication
* Reverse complements
* 6-frame codon indexing
* Hand-validated toy assemblies
---
# ๐ License
**Creative Commons Attribution-NonCommercial (CC BY-NC 4.0)**
See the `LICENSE` file for details.
---
# ๐ Citation
If you use **RustyOmeStats** in published work, please cite:
```text
White III RA et al.
RustyOmeStats: High-performance genome and metagenome assembly statistics in Rust.
```
---
# ๐ค Contributing
We welcome:
* ๐งฌ New assembly metrics
* โก Performance optimizations
* ๐ Visualization improvements
* ๐ Python plotting extensions
* ๐ฆ Rust ecosystem integrations
Pull requests and issues are encouraged.
---
# ๐ Support
* ๐ GitHub Issues:
- **Issues:** [RustyOmeStats Issues](https://github.com/raw937/rustyomestats/issues)
* ๐ง Contact:
- **Email:** [Dr. Richard Allen White III](mailto:rwhit101@uncc.edu)
- If you have any questions or feedback, please feel free to get in touch by email. </br>
---
<div align="center">
# ๐ฆ RustyOmeStats
### *Fast. Parallel. Modern Bioinformatics .*
Built with โค๏ธ in Rust.
</div>