orfm-rs
A pure-Rust port of OrfM, a simple and not slow open reading frame (ORF) caller. Finds all open reading frames (ORFs) in FASTA/FASTQ sequences by searching all six reading frames for continuous stretches of codons without stop codons, then translates them to amino acid sequences.
Install
From source (requires Rust toolchain)
Or build without installing:
# binary at target/release/orfm
Usage
orfm [OPTIONS] [INPUT]
Reads from stdin if no input file is given. Accepts FASTA or FASTQ, gzipped or uncompressed.
Options
| Flag | Description | Default |
|---|---|---|
-m <LENGTH> |
Minimum ORF length in nucleotides (must be a multiple of 3) | 96 |
-c <TABLE_ID> |
NCBI codon table for translation (1–25) | 1 |
-l <LENGTH> |
Ignore sequence beyond this position | none |
-t <FILE> |
Write nucleotide transcripts to this file | none |
Examples
# Basic usage
# From gzipped FASTQ, shorter minimum ORF length
# Pipe from stdin, write transcripts
|
# Use mitochondrial codon table
Library usage
orfm-rs can be used as a Rust library. Add to your Cargo.toml:
[]
= '*'
use OrfCaller;
let caller = new.unwrap; // table_id, min_length, position_limit
// Iterate over ORFs from a file
for orf in caller.call_from_file
// Or call on a single sequence
let orfs = caller.find_orfs;
for orf in &orfs
Benchmarking
A Snakefile is included for comparing performance against the original C OrfM. It generates random sequences, runs both tools, checks output correctness, and collects wall-clock time and memory usage.
When I ran it, orfm-rs was the winner by ~5% in walltime:
| tool | replicate | wall_clock_s | max_rss_kb |
|------|-----------|--------------|------------|
| orfm_c | 1 | 2.11 | 16088 |
| orfm_rs | 1 | 1.99 | 15572 |
| orfm_c | 2 | 2.1 | 15432 |
| orfm_rs | 2 | 2.0 | 16116 |
| orfm_c | 3 | 2.12 | 16164 |
| orfm_rs | 3 | 2.02 | 16148 |
Requires the C OrfM binary at ~/git/OrfM/orfm (this path can be changed in the Snakefile).
Differences from OrfM
- Pure Rust, no C dependencies
- Exposes a library API with an iterator over translated ORFs
- Supports all codon tables that OrfM supports (NCBI tables 1–25)
- Uses needletail for sequence parsing and aho-corasick for stop codon detection
License
orfm-rs is licensed under the GNU Lesser General Public License v3.0 (LGPL-3.0).
Citation
Since the algorithm was devised for the original version of OrfM, best to cite that:
Ben J. Woodcroft, Joel A. Boyd, and Gene W. Tyson. OrfM: A fast open reading frame predictor for metagenomic data. (2016). Bioinformatics. doi:10.1093/bioinformatics/btw241.