Expand description
Dealing with the busz compression format
§Examples
§Reading a compressed bus file
use bustools::busz::BuszReader;
let reader = BuszReader::new("/some/file.busz");
for record in reader {
// ...
}
§Writing to a compressed bus file
use bustools::busz::BuszWriter;
use bustools::io::{BusRecord, BusParams};
let blocksize = 10000;
let params = BusParams {cb_len: 16, umi_len: 12};
let mut writer = BuszWriter::new("/some/file.busz", params, blocksize);
let records = vec![
BusRecord { CB: 0, UMI: 1, EC: 0, COUNT: 12, FLAG: 0 },
BusRecord { CB: 0, UMI: 1, EC: 1, COUNT: 2, FLAG: 0 },
BusRecord { CB: 0, UMI: 2, EC: 0, COUNT: 12, FLAG: 0 },
BusRecord { CB: 1, UMI: 1, EC: 1, COUNT: 2, FLAG: 0 },
BusRecord { CB: 1, UMI: 2, EC: 1, COUNT: 2, FLAG: 0 },
BusRecord { CB: 1, UMI: 1, EC: 1, COUNT: 2, FLAG: 0 },
];
writer.write_iterator(records.into_iter());
§About Bitvec and Memory layout
This code relies heavily on BitVec. It uses bitvec
to encode/decode
the bits of the busz records, in particular Fibbonnaci encoding and NewPFD encoding.
A certain peculiarity though:
To turn bytes (e.g from a u64 or read from the file) into bitvec::vec::BitVec
we use BitVec::from_bytes(byte_Array)
This takes the bytes literally in the order of the array.
Yet bustools
writes busz in little endian format, i.e. the byte order is reversed.
In particular, each busz block contains entries for CB,UMI… each PADDED with zeros afterwards(to a multiple of 64)
On disk this is how it looks like:
0000000...00000000[CBs in Fibbonnaci]
0000000...00000000[UMIs in Fibbonnaci]
Even more, the fibbonacci encoding must be done with little endian byte order, if on disk it looks like
aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh //bits
the correct fibonacci stream to decode is
ddddddddccccccccbbbbbbbbaaaaaaaahhhhhhhhgggggggg....
Structs§
- Reading a compressed busfile
- Writing BusRecords into compressed .busz format
Functions§
- Compress
input
busfile intooutput
busz-file usingblocksize
- Decompress the
input
busz file into a plain busfile,output