Expand description
§oxbow
oxbow
reads genomic data formats 🧬 as Apache Arrow 🏹.
With the oxbow Rust library, you can serialize native formats into Arrow IPC , stream larger-than-memory files as Arrow RecordBatches with zero-copy over FFI, and more!
⚠️ The Rust API is under active development and is not yet stable. The API may change in future releases.
§Features
- 🚀 Supports commonly used file formats from the htslib/GA4GH and the UCSC ecosystems.
- 🔍 Support for compression, indexing, column projection, and genomic range querying.
- 🔧 Support for nested fields and complex, typed schemas (e.g., SAM tags,
VCF
INFO
andFORMAT
fields, AutoSql, etc.).
§Scanners
The main interface to read files are the scanners. Each scanner is a parser for a specific
format and provides scanning methods that return an iterator implementing the
arrow::record_batch::RecordBatchReader
trait.
§Sequence formats
§Alignment formats
§Variant formats
§Interval feature formats
bed
: Scan BED files as Arrow RecordBatches.gtf
: Scan GXF files as Arrow RecordBatches.gff
: Scan GFF files as Arrow RecordBatches.
§UCSC Big Binary Indexed (BBI) formats
bigbed
: Scan BigBed files as Arrow RecordBatches.bigwig
: Scan BigWig files as Arrow RecordBatches.BBI zoom
: Scan zoom level summary statistics from BigWig/BigBed as Arrow RecordBatches.
§License
Licensed under MIT or Apache-2.0.