Module encode

Module encode 

Source
Expand description

This module contains the main encoding functions for turning an input JSONL or BEN file into a BEN or XBEN file.

Any input JSONL file is expected to be in the standard

{"assignment": [...], "sample": #}

format.

The BEN format is a simple bit-packed run-length encoded assignment vector with some special headers that allow the decoder to know how many bytes to read for each sample.

The XBEN format uses LZMA2 dictionary compression on a byte-level decompressed version of the BEN format (known as ben32) to achieve better compression ratios than we could achieve with applying LZMA2 compression directly to the BEN format.

Modules§

relabel
This module contains the main functions that are used in the reben binary for relabeling the assignment vectors in a BEN file. The relabeling is done can be doe either so that the values are in ascending order or according to a mapping provided by the user in a map file.
translate
This module contains the main functions that are used for translating between the ben32 and BEN formats. The ben32 format is a simple run-length encoding of an assignment vector done at the byte level and for which every 32 bits of data encodes a one (assignment, count) pair. The BEN format is a bit-packed version of the ben32 format along with some extra headers.

Structs§

BenEncoder
A struct to make the writing of BEN files easier and more ergonomic.
XBenEncoder
A struct to make the writing of XBEN files easier and more ergonomic.

Functions§

encode_ben_to_xben
This function takes a BEN file and encodes it into an XBEN file using bit-to-byte decompression followed by LZMA2 compression.
encode_ben_vec_from_assign
This function takes in a standard assignment vector and encodes it into a bit-packed ben version.
encode_ben_vec_from_rle
This function takes a run-length encoded assignment vector and encodes into a bit-packed ben version
encode_jsonl_to_ben
This function takes a JSONL file and compresses it into the BEN format.
encode_jsonl_to_xben
This function takes a JSONL file and compresses it to the XBEN format.
xz_compress
This is a convenience function that applies level 9 LZMA2 compression to a general file.