Expand description
§Orphos Gene Finder - Rust Implementation
A high-performance Rust implementation of the Orphos prokaryotic gene finding algorithm. This library provides both single genome and metagenomic gene prediction capabilities with support for parallel processing.
§Overview
Orphos (Prokaryotic Dynamic Programming Gene-finding Algorithm) is an unsupervised machine learning method for finding genes in prokaryotic genomes. This Rust implementation maintains compatibility with the original C version while offering improved performance and safety.
§Features
- Single Genome Mode: Train on a complete genome for optimal gene prediction
- Metagenomic Mode: Predict genes in fragmented or mixed sequences
- Multiple Output Formats: Support for GenBank, GFF, GCA, and SCO formats
- Parallel Processing: Multi-threaded execution using Rayon
- Type Safety: Compile-time guarantees for training states
§Quick Start
use orphos_core::{OrphosAnalyzer, config::OrphosConfig};
// Create analyzer with default configuration
let mut analyzer = OrphosAnalyzer::new(OrphosConfig::default());
// Analyze a genome sequence
let results = analyzer.analyze_sequence(
"ATGCGATCGATCG...",
Some("MyGenome".to_string())
)?;
println!("Found {} genes", results.genes.len());§Architecture
The library uses a type-state pattern to ensure training is performed before gene prediction:
use orphos_core::engine::{UntrainedOrphos, Orphos, Untrained};
use orphos_core::config::OrphosConfig;
use orphos_core::sequence::encoded::EncodedSequence;
// Create an untrained analyzer
let mut untrained = UntrainedOrphos::with_config(OrphosConfig::default())?;
// Encode the sequence
let encoded = EncodedSequence::without_masking(b"ATGCGATCGATCG...");
// Train on the genome (type changes to TrainedOrphos)
let trained = untrained.train_single_genome(&encoded)?;
// Use the higher-level API to find genes
use orphos_core::OrphosAnalyzer;
let mut analyzer = OrphosAnalyzer::new(OrphosConfig::default());
let results = analyzer.analyze_sequence("ATGCGATCGATCG...", None)?;
println!("Found {} genes", results.genes.len());§Module Organization
config: Configuration options for analysisengine: Main analysis engine and training logictypes: Core data types and structuresresults: Gene prediction resultssequence: Sequence encoding and manipulationtraining: Training algorithms for gene modelsnode: Gene node management and scoringalgorithms: Core gene-finding algorithmsoutput: Output formatting for various file typesbitmap: Efficient sequence encoding utilities
§Output Formats
The library supports multiple output formats configured via config::OutputFormat:
- GenBank (GBK): Rich feature annotation format
- GFF3: General Feature Format version 3
- GCA: Gene coordinate annotation
- SCO: Simple coordinate output
§Error Handling
All fallible operations return Result<T, OrphosError>,
providing detailed error information for:
- Invalid sequences (too short, invalid characters)
- I/O errors during file operations
- Training failures
- Configuration errors
Re-exports§
pub use engine::OrphosAnalyzer;
Modules§
- algorithms
- Core gene-finding algorithms.
- bitmap
- config
- constants
- engine
- metagenomic
- node
- Node management and scoring for gene prediction
- output
- Output formatting for gene prediction results.
- results
- sequence
- Sequence encoding and manipulation utilities.
- training
- Training algorithms for gene prediction models.
- types