๐ฆ RustyOmeStats
Blazing-fast assembly statistics for genomes, metagenomes, transcriptomes, and beyond.
โก Modern Assembly Metrics โข ๐ Interactive Reports โข ๐งฌ Codon Analytics โข ๐ Parallelized Rust
๐ฌ What is RustyOmeStats?
RustyOmeStats is a high-performance bioinformatics toolkit written in Rust for calculating assembly statistics from:
- ๐งฌ Genomes
- ๐ Metagenomes
- ๐งซ MAGs
- ๐งช Transcriptomes
- ๐ง Metatranscriptomes
- ๐ Reference-guided assemblies
Designed for speed, reproducibility, and publication-ready outputs, RustyOmeStats combines modern Rust parallelism with rich plotting/report generation.
โจ Features
๐งฌ Genome / Metagenome Analytics
- GC%
- Sequence length statistics
- N/L metrics (N25โN90)
- 6-frame codon density
- FragGeneScan predicted codon usage
- Parallel FASTA processing
- Folder-wide assembly analysis
- Polars-backed tabular outputs
๐ Modern Assembly Metrics
- N50 / L50
- NG50 / LG50
- U50 / UL50
- UG50 / ULG50
- Gap interval detection
- Overlap interval detection
- Coverage visualization
- Reference-aware assembly evaluation
๐ผ๏ธ Example Outputs
| GC vs Length | Codon Heatmap | Coverage |
|---|---|---|
| ๐ Publication-ready | ๐ฅ Frame-aware | ๐งฌ Reference-guided |
โ Interactive HTML reports
โ Self-contained PNG figures
โ Polars DataFrames
โ Parallelized Rust backend
โ Reproducible workflows
โก Why RustyOmeStats?
| Feature | RustyOmeStats |
|---|---|
| ๐ Multi-threaded Rust core | โ |
| ๐งฌ Codon density analysis | โ |
| ๐ Automated visualization | โ |
| ๐ Metagenome support | โ |
| ๐ง U50/UG50 implementation | โ |
| ๐ Python plotting layer | โ |
| ๐ Batch assembly processing | โ |
| โ๏ธ Polars DataFrames | โ |
๐งฑ Architecture
flowchart LR
A[FASTA / BED Input] --> B[Rust Core]
B --> C[Polars DataFrames]
C --> D[CSV Outputs]
D --> E[Python Plotter]
E --> F[PNG Figures]
E --> G[Interactive HTML Report]
๐ฆ Tech Stack
| Component | Technology |
|---|---|
| Core engine | Rust |
| Parallelism | rayon |
| DataFrames | polars |
| FASTA/BED parsing | rust-bio |
| CLI | clap |
| Error handling | anyhow |
| Plotting | seaborn + matplotlib |
| ORF prediction | FragGeneScanRs |
๐ Installation
1๏ธโฃ Install Rust
|
2๏ธโฃ Install RustyOmeStats
From crates.io
From source
3๏ธโฃ Optional: FragGeneScanRs
Required only for predicted codon density.
or
4๏ธโฃ Install Plotting Dependencies
โก Quick Start
๐งฌ Analyze a Genome
Generate plots + HTML report:
๐ฆ Output Files
RustyOmeStats generates rich tabular outputs, publication-ready figures, and a fully self-contained interactive HTML report.
summary_stats.csv
โโโ Global assembly statistics
โโโ Total sequences / total bp
โโโ GC%
โโโ N25โN90 and L25โL90 assembly metrics
per_sequence.csv
โโโ Per-contig / per-sequence statistics
โโโ Sequence ID
โโโ Length distribution
โโโ GC composition for every record
length_intervals.csv
โโโ Length-bin frequency table
โโโ Histogram-ready interval counts
โโโ Used for contig size distribution plots
codon_absolute.csv
โโโ Raw 6-frame codon statistics
โโโ Codon counts and densities
โโโ Frame-specific measurements
โโโ Long-format analytics table
codon_absolute_aggregate.csv
โโโ Global codon usage profile
โโโ Aggregated across all sequences
โโโ 64-codon genome-wide abundance table
codon_predicted.csv
โโโ FragGeneScan-predicted ORF codons
โโโ Per-gene codon frequencies
โโโ Coding-region codon density statistics
codon_predicted_aggregate.csv
โโโ Aggregate predicted ORF codon usage
โโโ Genome-wide predicted CDS codon profile
โโโ Useful for translational bias analyses
codon_comparison.csv
โโโ Absolute vs predicted codon usage
โโโ Enrichment/depletion statistics
โโโ Translational bias comparisons
โโโ Predicted-over-absolute enrichment metrics
fgs_predicted.{ffn,faa,out,gff}
โโโ Raw FragGeneScanRs outputs
โโโ Predicted nucleotide ORFs (.ffn)
โโโ Predicted proteins (.faa)
โโโ Gene annotations (.gff)
โโโ Raw model output/log files
plot_length_histogram.png
โโโ Contig/scaffold size distribution
โโโ Publication-ready histogram visualization
plot_gc_distribution.png
โโโ GC variability across sequences
โโโ Detects compositional heterogeneity
plot_gc_vs_length.png
โโโ GC% versus sequence length
โโโ Detects assembly structure patterns
โโโ Useful for MAG/metagenome exploration
plot_codon_usage_bar.png
โโโ Genome-wide codon abundance plots
โโโ Absolute vs predicted codon usage
โโโ Translational preference visualization
plot_codon_heatmap_by_frame.png
โโโ 6-frame codon density heatmap
โโโ Frame-aware codon visualization
โโโ High-dimensional codon pattern analysis
plot_codon_enrichment.png
โโโ Codon enrichment/depletion analysis
โโโ Predicted vs absolute codon shifts
โโโ Translational bias visualization
report.html
โโโ Fully self-contained interactive report
โโโ All plots embedded inline
โโโ Metric summaries + tables
โโโ Portable single-file visualization dashboard
โโโ Shareable publication-ready report
๐ Analyze Multiple Assemblies
๐ง U50 / UG50 Assembly Metrics
RustyOmeStats implements the modern metrics proposed in:
Castro et al. 2017. Castro CJ, Ng TFF. U50: A New Metric for Measuring Assembly Output Based on Non-Overlapping, Target-Specific Contigs. J Comput Biol. 2017 24(11):1071-1080.
Including:
- U50
- UL50
- UG50
- ULG50
- Gap-aware assembly evaluation
- Overlap-aware assembly evaluation
๐ฌ Example U50 Workflow
Generate figures:
๐ Generated Visualizations
| Plot | Description |
|---|---|
| ๐ GC Distribution | GC variability across sequences |
| ๐ฅ Codon Heatmap | Frame-specific codon usage |
| ๐ Length Histogram | Assembly contig distributions |
| ๐งฌ Coverage Plot | Reference coverage structure |
| ๐ U50 Summary | Modern assembly metric overview |
๐ Interactive HTML Reports
RustyOmeStats automatically generates:
โ Self-contained HTML reports โ Inline PNG visualizations โ Portable single-file reports โ Publication-ready figures
Open directly in your browser:
๐ Library Usage
RustyOmeStats can also be embedded as a Rust crate.
use ;
use Path;
// genome stats
let files = collect_fasta_files?;
let recs = load_all_records?;
let basic = compute_basic;
println!;
// U50 stats
let res = compute_u50?;
println!;
๐งช Testing
Covers:
- N50/U50 correctness
- Greedy masking
- BED deduplication
- Reverse complements
- 6-frame codon indexing
- Hand-validated toy assemblies
๐ License
Creative Commons Attribution-NonCommercial (CC BY-NC 4.0)
See the LICENSE file for details.
๐ Citation
If you use RustyOmeStats in published work, please cite:
White III RA et al.
RustyOmeStats: High-performance genome and metagenome assembly statistics in Rust.
๐ค Contributing
We welcome:
- ๐งฌ New assembly metrics
- โก Performance optimizations
- ๐ Visualization improvements
- ๐ Python plotting extensions
- ๐ฆ Rust ecosystem integrations
Pull requests and issues are encouraged.
๐ Support
-
๐ GitHub Issues:
- Issues: RustyOmeStats Issues
-
๐ง Contact:
- Email: Dr. Richard Allen White III
- If you have any questions or feedback, please feel free to get in touch by email.
๐ฆ RustyOmeStats
Fast. Parallel. Modern Bioinformatics .
Built with โค๏ธ in Rust.