Module bustools::busz

source ·
Expand description

Dealing with the busz compression format

§Examples

§Reading a compressed bus file

use bustools::busz::BuszReader;
let reader = BuszReader::new("/some/file.busz");
for record in reader {
    // ...
}

§Writing to a compressed bus file

use bustools::busz::BuszWriter;
use bustools::io::{BusRecord, BusParams};
let blocksize = 10000;
let params = BusParams {cb_len: 16, umi_len: 12};
let mut writer = BuszWriter::new("/some/file.busz", params, blocksize);
let records = vec![
    BusRecord { CB: 0, UMI: 1, EC: 0, COUNT: 12, FLAG: 0 },
    BusRecord { CB: 0, UMI: 1, EC: 1, COUNT: 2, FLAG: 0 },
    BusRecord { CB: 0, UMI: 2, EC: 0, COUNT: 12, FLAG: 0 },
    BusRecord { CB: 1, UMI: 1, EC: 1, COUNT: 2, FLAG: 0 },
    BusRecord { CB: 1, UMI: 2, EC: 1, COUNT: 2, FLAG: 0 },
    BusRecord { CB: 1, UMI: 1, EC: 1, COUNT: 2, FLAG: 0 },
];
writer.write_iterator(records.into_iter());

§About Bitvec and Memory layout

This code relies heavily on BitVec. It uses bitvec to encode/decode the bits of the busz records, in particular Fibbonnaci encoding and NewPFD encoding.

A certain peculiarity though: To turn bytes (e.g from a u64 or read from the file) into bitvec::vec::BitVec we use BitVec::from_bytes(byte_Array) This takes the bytes literally in the order of the array. Yet bustools writes busz in little endian format, i.e. the byte order is reversed. In particular, each busz block contains entries for CB,UMI… each PADDED with zeros afterwards(to a multiple of 64) On disk this is how it looks like:

0000000...00000000[CBs in Fibbonnaci]
0000000...00000000[UMIs in Fibbonnaci]

Even more, the fibbonacci encoding must be done with little endian byte order, if on disk it looks like

aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh  //bits

the correct fibonacci stream to decode is

ddddddddccccccccbbbbbbbbaaaaaaaahhhhhhhhgggggggg....

Structs§

Functions§