1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
//! # bustools
//!
//! This library allows interaction with the Bus format (see [bustools](https://github.com/BUStools/bustools))
//! for scRNAseq data processing.
//!
//! At this point, the package is pretty mature, but there might be some minor features missing compared to the original bustools.
//!
//! # Basics of the library
//! The basic unit is the [`io::BusRecord`], which represents a single entry in a busfile,
//! consisting of CB, UMI, EC, COUNT and Flag.
//!
//! [`io::BusReader`] and [`io::BusWriter`] are the primary means to actually read and write busfiles.
//! These are polymorphic wrappers around the speicialized implementation for plain and compressed readers/writers which handle uncompressed and compressed files: [`io::BusReaderPlain`] and [`busz::BuszReader`] ([`io::BusWriterPlain`] and [`busz::BuszWriter`])
//! Any downstream code should really onyl accept the generic [`io::BusReader`] and [`io::BusWriter`] ot be agnostic of format.
//!
//! ## Iterate over a busfile
//! [`io`] contains the code to read and write from busfiles.
//! In particular it defines a simpe iterator over [`io::BusRecord`]s via [`io::BusReader`].
//! BusReader implements the trait [`io::CUGIterator`], a marker trait for anything that
//! iterates/produced streams of [`io::BusRecord`]s in our library.
//! ```rust, no_run
//! # use bustools::io::BusReader;
//! let breader = BusReader::new("/path/to/some.bus");
//! for record in breader {
//! // record.CB == ...
//! }
//! ```
//!
//! ## Advanced Iterators over busfiles
//! While [`io::BusReader`] lets you iterate over single [`io::BusRecord`]s,
//! it is often convenient to group the records by CB (all records from the same cell)
//! or by CB+UMI (all records from the same mRNA).
//! [`iterators`] contains the code to enable `chaining` iterators over BusRecords.
//!
//! Note that the bus file must be **sorted** (by CB/UMI) to enable these iterators (they will panic if used on an unsorted busfile).
//!
//! ### Iterate over cells
//! To iterate over a *sorted* busfile, grouping all records by CB:
//! ```rust, no_run
//! # use bustools::io::BusReader;
//! use bustools::iterators::CellGroupIterator; //need to bring that trait into scope
//!
//! let breader = BusReader::new("/path/to/some.bus");
//! for (cb, vector_of_records) in breader.groupby_cb() {
//! // Example: the number of records in that cell
//! let n_molecules: usize = vector_of_records.len();
//! }
//! ```
//!
//! ### Iterate over molecules
//! To iterate over a **sorted** busfile, grouping all records by CB+UMI:
//! ```rust, no_run
//! # use bustools::io::BusReader;
//! use bustools::iterators::CbUmiGroupIterator; //need to bring that trait into scope
//!
//! let breader = BusReader::new("/path/to/some.bus");
//! for ((cb, umi), vector_of_records) in breader.groupby_cbumi() {
//! // Example: the number of reads of that molecule (CB/UMI)
//! let n_reads: u32 = vector_of_records.iter().map(|r| r.COUNT).sum();
//! }
//! ```
//! ## EC to gene mapping
//! More convenient features are provided by [`io::BusFolder`],
//! which wraps around the `.bus` file, the `matric.ec` and `transcripts.txt` created by the `kallisto bus` command.
//! Those files tell us what a particular [`io::BusRecord`]
//! actually maps to as specified by its EC (equivalence class, a set of transcripts).
//! This automatically constructs a mapper from equivalence class to gene via [`consistent_genes::Ec2GeneMapper`]
//! which allows to resolve ECs to genes.
//!
//! ```rust, no_run
//! # use bustools::io::BusFolder;
//! # use bustools::consistent_genes::EC;
//! let bfolder = BusFolder::new("/path/to/busfolder");
//! let ec_mapper = bfolder.make_mapper("/path/to/transcripts_to_genes.txt");
//! let gene_names = ec_mapper.get_genenames(EC(1));
//! ```
// #![deny(missing_docs)]
// pub mod io_dyn;
// pub mod io_generic;
// mod runlength_codec;
// pub mod channel;
// pub mod buffered_channels;
// pub mod new_channel;