Expand description
§bustools
This library allows interaction with the Bus format (see bustools) for scRNAseq data processing.
At this point, the package is pretty mature, but there might be some minor features missing compared to the original bustools.
§Basics of the library
The basic unit is the io::BusRecord, which represents a single entry in a busfile,
consisting of CB, UMI, EC, COUNT and Flag.
io::BusReader and io::BusWriter are the primary means to actually read and write busfiles.
These are polymorphic wrappers around the speicialized implementation for plain and compressed readers/writers which handle uncompressed and compressed files: io::BusReaderPlain and busz::BuszReader (io::BusWriterPlain and busz::BuszWriter)
Any downstream code should really onyl accept the generic io::BusReader and io::BusWriter ot be agnostic of format.
§Iterate over a busfile
io contains the code to read and write from busfiles.
In particular it defines a simpe iterator over io::BusRecords via io::BusReader.
BusReader implements the trait io::CUGIterator, a marker trait for anything that
iterates/produced streams of io::BusRecords in our library.
let breader = BusReader::new("/path/to/some.bus");
for record in breader {
// record.CB == ...
}§Advanced Iterators over busfiles
While io::BusReader lets you iterate over single io::BusRecords,
it is often convenient to group the records by CB (all records from the same cell)
or by CB+UMI (all records from the same mRNA).
iterators contains the code to enable chaining iterators over BusRecords.
Note that the bus file must be sorted (by CB/UMI) to enable these iterators (they will panic if used on an unsorted busfile).
§Iterate over cells
To iterate over a sorted busfile, grouping all records by CB:
use bustools::iterators::CellGroupIterator; //need to bring that trait into scope
let breader = BusReader::new("/path/to/some.bus");
for (cb, vector_of_records) in breader.groupby_cb() {
// Example: the number of records in that cell
let n_molecules: usize = vector_of_records.len();
}§Iterate over molecules
To iterate over a sorted busfile, grouping all records by CB+UMI:
use bustools::iterators::CbUmiGroupIterator; //need to bring that trait into scope
let breader = BusReader::new("/path/to/some.bus");
for ((cb, umi), vector_of_records) in breader.groupby_cbumi() {
// Example: the number of reads of that molecule (CB/UMI)
let n_reads: u32 = vector_of_records.iter().map(|r| r.COUNT).sum();
}§EC to gene mapping
More convenient features are provided by io::BusFolder,
which wraps around the .bus file, the matric.ec and transcripts.txt created by the kallisto bus command.
Those files tell us what a particular io::BusRecord
actually maps to as specified by its EC (equivalence class, a set of transcripts).
This automatically constructs a mapper from equivalence class to gene via consistent_genes::Ec2GeneMapper
which allows to resolve ECs to genes.
let bfolder = BusFolder::new("/path/to/busfolder");
let ec_mapper = bfolder.make_mapper("/path/to/transcripts_to_genes.txt");
let gene_names = ec_mapper.get_genenames(EC(1));Modules§
- bus_
multi Deprecated - A module that allows iteration of multiple busfiles simulatniously
- busz
- Dealing with the busz compression format
- consistent_
genes - This module handles the Equivalance class to gene mapping
- consistent_
transcripts - Mapping ECs to transcripts
- disjoint
- Module for the Intersector struct
- io
- The io module of bustools deals with reading and writing busfiles.
- iterators
- Advanced iterators over busrecords, grouping records by cell or molecule.
- merger
- An iterator that merges mutliple sorted iterators by item
- utils
- Utilities