Orphos CLI
Command-line interface for Orphos, a fast, parallel Rust implementation of Prodigal for finding protein-coding genes in microbial genomes.
Features
- 🚀 High Performance: Multi-threaded processing using Rayon
- 💾 Memory Efficient: Optimized for large genomes and metagenomic assemblies
- 🔄 Compatible: Output format compatible with original Prodigal
- 🌍 Cross-Platform: Works on Linux, macOS, and Windows
- 📊 Multiple Output Formats: GenBank, GFF3, SCO, and GCA formats
- 🧬 Flexible Modes: Single genome and metagenomic analysis modes
Installation
Using Cargo
From Source
Homebrew (macOS/Linux)
Conda
Quick Start
Basic Usage
# Analyze a genome and output GenBank format
# Analyze with GFF3 output
# Metagenomic mode for short contigs
# Complete circular genome (closed ends)
Reading from stdin/stdout
# Input from stdin
|
# Output to stdout
# Pipe both
|
Command-Line Options
Required/Input
| Option | Short | Long | Description |
|---|---|---|---|
| Input file | -i |
--input |
Input FASTA file (default: stdin) |
| Output file | -o |
--output |
Output file (default: stdout) |
Output Options
| Option | Short | Long | Default | Description |
|---|---|---|---|---|
| Format | -f |
--format |
gbk |
Output format: gbk, gff, sco, gca |
Analysis Options
| Option | Short | Long | Default | Description |
|---|---|---|---|---|
| Mode | -p |
--mode |
single |
Analysis mode: single or meta |
| Closed ends | -c |
--closed |
false | No genes off edges (for complete genomes) |
| Mask N's | -m |
--mask |
false | Mask runs of N's |
| Translation table | -g |
--translation-table |
auto | Translation table (1-25) |
| Training file | -t |
--training |
- | Use pre-trained parameters |
Other Options
| Option | Short | Long | Description |
|---|---|---|---|
| Quiet | -q |
--quiet |
Suppress progress messages |
| Help | -h |
--help |
Display help information |
| Version | -V |
--version |
Display version information |
Output Formats
GenBank (gbk)
Rich annotation format with gene features, translations, and metadata.
GFF3 (gff)
General Feature Format version 3, widely used in genomics pipelines.
Simple Coordinate Output (sco)
Tab-delimited gene coordinates for easy parsing.
Gene Coordinate Annotation (gca)
Compact coordinate format.
Analysis Modes
Single Genome Mode (default)
Use for complete or near-complete genomes (>100kb). Orphos will train on the genome to optimize gene prediction accuracy.
Best for:
- Complete bacterial genomes
- Complete archaeal genomes
- Large contigs or chromosomes
- Closed genomes
Metagenomic Mode
Use for short contigs or mixed metagenomic assemblies. Uses pre-trained parameters instead of training on the input.
Best for:
- Metagenomic assemblies
- Short contigs (<100kb)
- Mixed-species samples
- Fragmented sequences
Advanced Examples
Complete Circular Genome
For complete circular genomes (chromosomes, plasmids), use the -c flag to prevent genes from being called off the edges:
Custom Translation Table
Specify a custom genetic code (translation table):
# Use translation table 4 (Mycoplasma/Spiroplasma)
# Use translation table 11 (Bacterial and Archaea)
Masking Low-Quality Regions
Mask runs of N's in low-quality sequences:
Batch Processing
Process multiple genomes:
for; do
base=
done
Pipeline Integration
Integrate with other bioinformatics tools:
# Find genes and extract protein sequences
# ... then use genes.gff with other tools
# Combine with annotation pipelines
Performance Tips
- Use multiple cores: Orphos automatically uses all available CPU cores via Rayon
- Metagenomic mode for many small contigs: Faster than single mode for fragmented assemblies
- Batch processing: Process multiple files in parallel using shell scripting
- Large files: Orphos handles multi-GB files efficiently
Translation Tables
Orphos supports NCBI translation tables 1-25 (excluding 7, 8, 17-20). Common tables:
| Table | Name | Organisms |
|---|---|---|
| 1 | Standard | Most eukaryotes |
| 4 | Mycoplasma/Spiroplasma | Mycoplasma, Spiroplasma |
| 11 | Bacterial, Archaeal, Plant Plastid | Most bacteria and archaea (default) |
| 25 | Candidate Division SR1, Gracilibacteria | Certain bacteria |
Related Projects
- orphos-core: Rust library for gene prediction
- orphos-python: Python bindings
- orphos-wasm: WebAssembly module for browser/Node.js
Contributing
We welcome contributions! Please see the main repository for contribution guidelines.
License
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
Citation
If you use Orphos in your research, please cite:
# TODO: Add citation information
Acknowledgments
This project is inspired by the original Prodigal by Doug Hyatt. We thank the authors for their groundbreaking work in prokaryotic gene prediction.
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs.rs/orphos-cli