Expand description
GECCO: Gene Cluster prediction with Conditional Random Fields.
This crate provides both a CLI tool and a library API for identifying putative Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data.
§Library usage
The easiest way to use GECCO as a library is through the Gecco struct:
use gecco::Gecco;
use gecco::orf::SeqRecord;
let pipeline = Gecco::builder()
.data_dir("gecco_data")
.threshold(0.8)
.build()
.unwrap();
let records = vec![SeqRecord {
id: "contig_1".into(),
seq: "ATGCCC...".into(),
}];
let results = pipeline.scan(&records).unwrap();
for cluster in &results.clusters {
println!("{}: {} genes", cluster.id, cluster.genes.len());
}For finer control, individual pipeline stages are also available as public
methods on Gecco, or through the lower-level modules directly.
Re-exports§
pub use model::Cluster;pub use model::Domain;pub use model::Gene;pub use model::Protein;pub use orf::SeqRecord;pub use pipeline::Gecco;pub use pipeline::GeccoBuilder;pub use pipeline::GeccoResults;
Modules§
- cli
- CLI commands for GECCO.
- crf
- CRF-based gene cluster prediction.
- data_
dir - Resolve the GECCO data directory containing HMM, CRF model, and InterPro files.
- hmmer
- HMMER domain annotation module.
- interpro
- InterPro metadata for domain annotations.
- io
- I/O utilities: TSV tables, compression, FASTA/GenBank.
- model
- Data layer types for gene cluster detection.
- orf
- ORF finding: gene prediction from DNA sequences.
- output
- pipeline
- High-level pipeline API for library consumers.
- refine
- Cluster refinement: extract contiguous gene clusters from CRF predictions.
- sklearn_
rf - Evaluator and small trainer for scikit-learn-shaped random forests.
- types
- Supervised classifier to predict the type of a gene cluster.
- util
- Shared utility functions.