Expand description
This module contains the main encoding functions for turning an input JSONL or BEN file into a BEN or XBEN file.
Any input JSONL file is expected to be in the standard
{"assignment": [...], "sample": #}
format.
The BEN format is a simple bit-packed run-length encoded assignment vector with some special headers that allow the decoder to know how many bytes to read for each sample.
The XBEN format uses LZMA2 dictionary compression on a byte-level decompressed version of the BEN format (known as ben32) to achieve better compression ratios than we could achieve with applying LZMA2 compression directly to the BEN format.
Modules§
- This module contains the main functions that are used in the
rebenbinary for relabeling the assignment vectors in a BEN file. The relabeling is done can be doe either so that the values are in ascending order or according to a mapping provided by the user in a map file. - This module contains the main functions that are used for translating between the ben32 and BEN formats. The ben32 format is a simple run-length encoding of an assignment vector done at the byte level and for which every 32 bits of data encodes a one (assignment, count) pair. The BEN format is a bit-packed version of the ben32 format along with some extra headers.
Structs§
- A struct to make the writing of BEN files easier and more ergonomic.
- A struct to make the writing of XBEN files easier and more ergonomic.
Functions§
- This function takes a BEN file and encodes it into an XBEN file using bit-to-byte decompression followed by LZMA2 compression.
- This function takes in a standard assignment vector and encodes it into a bit-packed ben version.
- This function takes a JSONL file and compresses it into the BEN format.
- This function takes a JSONL file and compresses it to the XBEN format.
- This is a convenience function that applies level 9 LZMA2 compression to a general file.