Crate bustools

Source
Expand description

§bustools

This library allows interaction with the Bus format (see bustools) for scRNAseq data processing.

At this point, the package is pretty mature, but there might be some minor features missing compared to the original bustools.

§Basics of the library

The basic unit is the io::BusRecord, which represents a single entry in a busfile, consisting of CB, UMI, EC, COUNT and Flag.

io::BusReader and io::BusWriter are the primary means to actually read and write busfiles. These are polymorphic wrappers around the speicialized implementation for plain and compressed readers/writers which handle uncompressed and compressed files: io::BusReaderPlain and busz::BuszReader (io::BusWriterPlain and busz::BuszWriter) Any downstream code should really onyl accept the generic io::BusReader and io::BusWriter ot be agnostic of format.

§Iterate over a busfile

io contains the code to read and write from busfiles. In particular it defines a simpe iterator over io::BusRecords via io::BusReader. BusReader implements the trait io::CUGIterator, a marker trait for anything that iterates/produced streams of io::BusRecords in our library.

let breader = BusReader::new("/path/to/some.bus");
for record in breader {
    // record.CB == ...
}

§Advanced Iterators over busfiles

While io::BusReader lets you iterate over single io::BusRecords, it is often convenient to group the records by CB (all records from the same cell) or by CB+UMI (all records from the same mRNA). iterators contains the code to enable chaining iterators over BusRecords.

Note that the bus file must be sorted (by CB/UMI) to enable these iterators (they will panic if used on an unsorted busfile).

§Iterate over cells

To iterate over a sorted busfile, grouping all records by CB:

use bustools::iterators::CellGroupIterator; //need to bring that trait into scope
 
let breader = BusReader::new("/path/to/some.bus");
for (cb, vector_of_records) in breader.groupby_cb() {
    // Example: the number of records in that cell
    let n_molecules: usize = vector_of_records.len();
}

§Iterate over molecules

To iterate over a sorted busfile, grouping all records by CB+UMI:

use bustools::iterators::CbUmiGroupIterator; //need to bring that trait into scope
 
let breader = BusReader::new("/path/to/some.bus");
for ((cb, umi), vector_of_records) in breader.groupby_cbumi() {
    // Example: the number of reads of that molecule (CB/UMI)
    let n_reads: u32 = vector_of_records.iter().map(|r| r.COUNT).sum();
}

§EC to gene mapping

More convenient features are provided by io::BusFolder, which wraps around the .bus file, the matric.ec and transcripts.txt created by the kallisto bus command. Those files tell us what a particular io::BusRecord
actually maps to as specified by its EC (equivalence class, a set of transcripts). This automatically constructs a mapper from equivalence class to gene via consistent_genes::Ec2GeneMapper which allows to resolve ECs to genes.

let bfolder = BusFolder::new("/path/to/busfolder");
let ec_mapper = bfolder.make_mapper("/path/to/transcripts_to_genes.txt");
let gene_names = ec_mapper.get_genenames(EC(1));

Modules§

bus_multiDeprecated
A module that allows iteration of multiple busfiles simulatniously
busz
Dealing with the busz compression format
consistent_genes
This module handles the Equivalance class to gene mapping
consistent_transcripts
Mapping ECs to transcripts
disjoint
Module for the Intersector struct
io
The io module of bustools deals with reading and writing busfiles.
iterators
Advanced iterators over busrecords, grouping records by cell or molecule.
merger
An iterator that merges mutliple sorted iterators by item
utils
Utilities