Skip to main content

Crate gecco

Crate gecco 

Source
Expand description

GECCO: Gene Cluster prediction with Conditional Random Fields.

This crate provides both a CLI tool and a library API for identifying putative Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data.

§Library usage

The easiest way to use GECCO as a library is through the Gecco struct:

use gecco::Gecco;
use gecco::orf::SeqRecord;

let pipeline = Gecco::builder()
    .data_dir("gecco_data")
    .threshold(0.8)
    .build()
    .unwrap();

let records = vec![SeqRecord {
    id: "contig_1".into(),
    seq: "ATGCCC...".into(),
}];

let results = pipeline.scan(&records).unwrap();
for cluster in &results.clusters {
    println!("{}: {} genes", cluster.id, cluster.genes.len());
}

For finer control, individual pipeline stages are also available as public methods on Gecco, or through the lower-level modules directly.

Re-exports§

pub use model::Cluster;
pub use model::Domain;
pub use model::Gene;
pub use model::Protein;
pub use orf::SeqRecord;
pub use pipeline::Gecco;
pub use pipeline::GeccoBuilder;
pub use pipeline::GeccoResults;

Modules§

cli
CLI commands for GECCO.
crf
CRF-based gene cluster prediction.
data_dir
Resolve the GECCO data directory containing HMM, CRF model, and InterPro files.
hmmer
HMMER domain annotation module.
interpro
InterPro metadata for domain annotations.
io
I/O utilities: TSV tables, compression, FASTA/GenBank.
model
Data layer types for gene cluster detection.
orf
ORF finding: gene prediction from DNA sequences.
output
pipeline
High-level pipeline API for library consumers.
refine
Cluster refinement: extract contiguous gene clusters from CRF predictions.
sklearn_rf
Evaluator and small trainer for scikit-learn-shaped random forests.
types
Supervised classifier to predict the type of a gene cluster.
util
Shared utility functions.