Crate exon

source ·
Expand description

Exon is a library to facilitate open-ended analysis of scientific data, ease the application of ML models, and provide a common data interface for science and engineering teams.


The main interface for users is through datafusion’s SessionContext plus the ExonSessionExt extension trait. This has a number of convenience methods for loading data from various sources.

See the read_* methods on ExonSessionExt for more information. For example, read_fasta, or read_gff. There’s also a read_inferred_exon_table method that will attempt to infer the data type and compression from the file extension for ease of use.

To facilitate those methods, Exon implements a number of traits for DataFusion that serve as a good base for scientific data work. See the datasources module for more information.


§Loading a FASTQ file

use exon::ExonSessionExt;

use datafusion::prelude::*;
use datafusion::error::Result;

let ctx = SessionContext::new_exon();

let df = ctx.read_fastq("test-data/datasources/fastq/test.fastq", None).await?;

assert_eq!(df.schema().fields().len(), 4);
assert_eq!(df.schema().field(0).name(), "name");
assert_eq!(df.schema().field(1).name(), "description");
assert_eq!(df.schema().field(2).name(), "sequence");
assert_eq!(df.schema().field(3).name(), "quality_scores");