1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87
//! # bustools
//!
//! This library allows interaction with the Bus format (see [bustools](https://github.com/BUStools/bustools))
//! for scRNAseq data processing.
//!
//! At this point, it's **far from complete and correct**, but rather a project to learn rust.
//!
//! # Basics of the library
//! The basic unit is the [`io::BusRecord`], which represents a single entry in a busfile,
//! consisting of CB, UMI, EC, COUNT and Flag.
//!
//! ## Iterate over a busfile
//! [`io`] contains the code to read and write from busfiles.
//! In particular it defines a simpe iterator over [`io::BusRecord`]s via [`io::BusReader`].
//! BusReader implements the trait [`io::CUGIterator`], a marker trait for anything that
//! iterates/produced streams of [`io::BusRecord`]s in our library.
//! ```rust, no_run
//! # use bustools::io::BusReader;
//! let breader = BusReader::new("/path/to/some.bus");
//! for record in breader {
//! // record.CB == ...
//! }
//! ```
//!
//! ## Advanced Iterators over busfiles
//! While [`io::BusReader`] lets you iterate over single [`io::BusRecord`]s,
//! it is often convenient to group the records by CB (all records from the same cell)
//! or by CB+UMI (all records from the same mRNA).
//! [`iterators`] contains the code to enable `chaining` iterators over BusRecords.
//!
//! Note that the bus file must be **sorted** (by CB/UMI) to enable these iterators (they will panic if used on an unsorted busfile).
//!
//! ### Iterate over cells
//! To iterate over a *sorted* busfile, grouping all records by CB:
//! ```rust, no_run
//! # use bustools::io::BusReader;
//! use bustools::iterators::CellGroupIterator; //need to bring that trait into scope
//!
//! let breader = BusReader::new("/path/to/some.bus");
//! for (cb, vector_of_records) in breader.groupby_cb() {
//! // Example: the number of records in that cell
//! let n_molecules: usize = vector_of_records.len();
//! }
//! ```
//!
//! ### Iterate over molecules
//! To iterate over a **sorted** busfile, grouping all records by CB+UMI:
//! ```rust, no_run
//! # use bustools::io::BusReader;
//! use bustools::iterators::CbUmiGroupIterator; //need to bring that trait into scope
//!
//! let breader = BusReader::new("/path/to/some.bus");
//! for ((cb, umi), vector_of_records) in breader.groupby_cbumi() {
//! // Example: the number of reads of that molecule (CB/UMI)
//! let n_reads: u32 = vector_of_records.iter().map(|r| r.COUNT).sum();
//! }
//! ```
//! ## EC to gene mapping
//! More convenient features are provided by [`io::BusFolder`],
//! which wraps around the busfile, the matric.ec and transcripts.txt created by `kallisto bus`.
//! Those files tell us what a particular bus record `(CB,UMI,EC,Count,flag)`
//! actually maps to as specified by its EC (equivalence class, a set of transcripts).
//! This automatically constructs a mapper from equivalence class to gene via [`consistent_genes::Ec2GeneMapper`]
//! which allows to resolve ECs to genes
//!
//! ```rust, no_run
//! # use bustools::io::BusFolder;
//! # use bustools::consistent_genes::EC;
//! let bfolder = BusFolder::new("/path/to/busfolder");
//! let ec_mapper = bfolder.make_mapper("/path/to/transcripts_to_genes.txt");
//! let gene_names = ec_mapper.get_genenames(EC(1));
//! ```
// #![deny(missing_docs)]
pub mod io;
pub mod iterators;
pub mod bus_multi;
pub mod utils;
pub mod multinomial;
pub mod consistent_genes;
pub mod disjoint;
pub mod merger;
pub mod busz;
// mod runlength_codec;
// pub mod channel;
// pub mod buffered_channels;
// pub mod new_channel;