Skip to main content

Module busz

Module busz 

Source
Expand description

Dealing with the busz compression format

§Examples

§Reading a compressed bus file

use bustools_core::busz::BuszReader;
let reader = BuszReader::new(Path::new("/some/file.busz"));
for record in reader {
    // ...
}

§Writing to a compressed bus file

use bustools_core::record;
use bustools_core::busz::BuszWriter;
use bustools_core::io::{BusRecord, BusParams};
let blocksize = 10000;
let params = BusParams {cb_len: 16, umi_len: 12};
let mut writer = BuszWriter::new(Path::new("/some/file.busz"), params, blocksize);
let records = vec![
    record!(0, 1, 0, 12,  0 ),
    record!(0, 1, 1, 2,  0 ),
    record!(0, 2, 0, 12,  0 ),
    record!(1, 1, 1, 2,  0 ),
    record!(1, 2, 1, 2,  0 ),
    record!(1, 1, 1, 2,  0 ),
];
writer.write_iterator(records.into_iter());

§About Bitvec and Memory layout

This code relies heavily on BitVec. It uses bitvec to encode/decode the bits of the busz records, in particular Fibbonnaci encoding and NewPFD encoding.

A certain peculiarity though: To turn bytes (e.g from a u64 or read from the file) into bitvec::vec::BitVec we use BitVec::from_bytes(byte_Array) This takes the bytes literally in the order of the array. Yet bustools writes busz in little endian format, i.e. the byte order is reversed. In particular, each busz block contains entries for CB,UMI… each PADDED with zeros afterwards(to a multiple of 64) On disk this is how it looks like:

0000000...00000000[CBs in Fibbonnaci]
0000000...00000000[UMIs in Fibbonnaci]

Even more, the fibbonacci encoding must be done with little endian byte order, if on disk it looks like

aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh  //bits

the correct fibonacci stream to decode is

ddddddddccccccccbbbbbbbbaaaaaaaahhhhhhhhgggggggg....

Re-exports§

pub use decode_bytes::BuszReader;

Modules§

decode_bytes
This version uses the FastFib byte-based decoder (rather than doing explicit operations on a stream of bits) i.e. the BlockDecoder directly consumes the bytes rather than converting to bitvec.

Structs§

BuszWriter
Writing BusRecords into compressed .busz format