Crate bustools

source ·
Expand description

§bustools

This library allows interaction with the Bus format (see bustools) for scRNAseq data processing.

At this point, it’s far from complete and correct, but rather a project to learn rust.

§Basics of the library

The basic unit is the io::BusRecord, which represents a single entry in a busfile, consisting of CB, UMI, EC, COUNT and Flag.

§Iterate over a busfile

io contains the code to read and write from busfiles. In particular it defines a simpe iterator over io::BusRecords via io::BusReader. BusReader implements the trait io::CUGIterator, a marker trait for anything that iterates/produced streams of io::BusRecords in our library.

let breader = BusReader::new("/path/to/some.bus");
for record in breader {
    // record.CB == ...
}

§Advanced Iterators over busfiles

While io::BusReader lets you iterate over single io::BusRecords, it is often convenient to group the records by CB (all records from the same cell) or by CB+UMI (all records from the same mRNA). iterators contains the code to enable chaining iterators over BusRecords.

Note that the bus file must be sorted (by CB/UMI) to enable these iterators (they will panic if used on an unsorted busfile).

§Iterate over cells

To iterate over a sorted busfile, grouping all records by CB:

use bustools::iterators::CellGroupIterator; //need to bring that trait into scope
 
let breader = BusReader::new("/path/to/some.bus");
for (cb, vector_of_records) in breader.groupby_cb() {
    // Example: the number of records in that cell
    let n_molecules: usize = vector_of_records.len();
}

§Iterate over molecules

To iterate over a sorted busfile, grouping all records by CB+UMI:

use bustools::iterators::CbUmiGroupIterator; //need to bring that trait into scope
 
let breader = BusReader::new("/path/to/some.bus");
for ((cb, umi), vector_of_records) in breader.groupby_cbumi() {
    // Example: the number of reads of that molecule (CB/UMI)
    let n_reads: u32 = vector_of_records.iter().map(|r| r.COUNT).sum();
}

§EC to gene mapping

More convenient features are provided by io::BusFolder, which wraps around the busfile, the matric.ec and transcripts.txt created by kallisto bus. Those files tell us what a particular bus record (CB,UMI,EC,Count,flag) actually maps to as specified by its EC (equivalence class, a set of transcripts). This automatically constructs a mapper from equivalence class to gene via consistent_genes::Ec2GeneMapper which allows to resolve ECs to genes

let bfolder = BusFolder::new("/path/to/busfolder");
let ec_mapper = bfolder.make_mapper("/path/to/transcripts_to_genes.txt");
let gene_names = ec_mapper.get_genenames(EC(1));

Modules§

  • A module that allows iteration of multiple busfiles simulatniously
  • Dealing with the busz compression format
  • This module handles the Equivalance class to gene mapping
  • Module for the Intersector struct
  • The io module of bustools
  • Module for more advanced iterators over busrecords. Allows to iterate over
  • An iterator that merges mutliple sorted iterators by item
  • Multinomial sampling of large/long probability vectors
  • Utilities, such as barcode <-> int conversion