Module training

Expand description

Training algorithms for gene prediction models.

This module implements the unsupervised machine learning algorithms that train Orphos’s statistical models from genome sequences.

§Overview

Training extracts statistical patterns from genes predicted in an initial pass:

Initial gene finding: Find high-confidence genes using basic models
Codon usage: Calculate dicodon frequencies in predicted genes
Start codon preference: Learn ATG/GTG/TTG usage patterns
RBS detection: Identify ribosome binding site motifs (Shine-Dalgarno)
Upstream composition: Analyze nucleotide patterns near start codons
GC bias: Detect reading frame preferences based on GC content

§Training Modes

Shine-Dalgarno (SD): For organisms with canonical RBS motifs
Non-SD: For organisms without RBS or with alternative start recognition

The mode is auto-detected based on the strength of SD signals in the training data.

§Modules

sd_training: Shine-Dalgarno motif training
non_sd_training: Alternative start recognition training
common: Shared training utilities

§Examples

Training is normally performed automatically by the OrphosAnalyzer, but can be done manually for advanced use cases:

use orphos_core::engine::UntrainedOrphos;
use orphos_core::config::OrphosConfig;
use orphos_core::sequence::encoded::EncodedSequence;

let mut orphos = UntrainedOrphos::new();
let sequence = b"ATGAAACGCATTAGCACCACCATT...";
let encoded = EncodedSequence::without_masking(sequence);

// Train on the genome
let trained = orphos.train_single_genome(&encoded)?;

// Training data is now stored in the TrainedOrphos instance

Modules§

common
non_sd_training
sd_training

Functions§

load_training_file
should_use_sd: Checks if training should use Shine-Dalgarno motifs.
write_training_file

Module training

Module training Copy item path

§Overview

§Training Modes

§Modules

§Examples

Modules§

Functions§

Module training