Module ben::encode

source ·
Expand description

This module contains the main encoding functions for turning an input JSONL or BEN file into a BEN or XBEN file.

Any input JSONL file is expected to be in the standard

{"assignment": [...], "sample": #}

format.

The BEN format is a simple bit-packed run-length encoded assignment vector with some special headers that allow the decoder to know how many bytes to read for each sample.

The XBEN format uses LZMA2 dictionary compression on a byte-level decompressed version of the BEN format (known as ben32) to achieve better compression ratios than we could achieve with applying LZMA2 compression directly to the BEN format.

Modules§

  • This module contains the main functions that are used in the reben binary for relabeling the assignment vectors in a BEN file. The relabeling is done can be doe either so that the values are in ascending order or according to a mapping provided by the user in a map file.
  • This module contains the main functions that are used for translating between the ben32 and BEN formats. The ben32 format is a simple run-length encoding of an assignment vector done at the byte level and for which every 32 bits of data encodes a one (assignment, count) pair. The BEN format is a bit-packed version of the ben32 format along with some extra headers.

Structs§

  • A struct to make the writing of BEN files easier and more ergonomic.
  • A struct to make the writing of XBEN files easier and more ergonomic.

Functions§

  • This function takes a BEN file and encodes it into an XBEN file using bit-to-byte decompression followed by LZMA2 compression.
  • This function takes in a standard assignment vector and encodes it into a bit-packed ben version.
  • This function takes a JSONL file and compresses it into the BEN format.
  • This function takes a JSONL file and compresses it to the XBEN format.
  • This is a convenience function that applies level 9 LZMA2 compression to a general file.