Crate oxbow

Source
Expand description

§oxbow

oxbow reads genomic data formats 🧬 as Apache Arrow 🏹.

With the oxbow Rust library, you can serialize native formats into Arrow IPC , stream larger-than-memory files as Arrow RecordBatches with zero-copy over FFI, and more!

⚠️ The Rust API is under active development and is not yet stable. The API may change in future releases.

Source on GitHub.

§Features

  • 🚀 Supports commonly used file formats from the htslib/GA4GH and the UCSC ecosystems.
  • 🔍 Support for compression, indexing, column projection, and genomic range querying.
  • 🔧 Support for nested fields and complex, typed schemas (e.g., SAM tags, VCF INFO and FORMAT fields, AutoSql, etc.).

§Scanners

The main interface to read files are the scanners. Each scanner is a parser for a specific format and provides scanning methods that return an iterator implementing the arrow::record_batch::RecordBatchReader trait.

§Sequence formats

  • fasta: Scan FASTA files as Arrow RecordBatches.
  • fastq: Scan FASTQ files as Arrow RecordBatches.

§Alignment formats

  • sam: Scan SAM files as Arrow RecordBatches.
  • bam: Scan BAM files as Arrow RecordBatches.

§Variant formats

  • vcf: Scan VCF files as Arrow RecordBatches.
  • bcf: Scan BCF files as Arrow RecordBatches.

§Interval feature formats

  • bed: Scan BED files as Arrow RecordBatches.
  • gtf: Scan GXF files as Arrow RecordBatches.
  • gff: Scan GFF files as Arrow RecordBatches.

§UCSC Big Binary Indexed (BBI) formats

  • bigbed: Scan BigBed files as Arrow RecordBatches.
  • bigwig: Scan BigWig files as Arrow RecordBatches.
  • BBI zoom: Scan zoom level summary statistics from BigWig/BigBed as Arrow RecordBatches.

§License

Licensed under MIT or Apache-2.0.

Modules§

alignment
bbi
bed
gxf
sequence
util
variant