q_compress
Usage
To run something right away, see the primary example.
For a lower-level API that allows writing/reading one chunk at a time and extracting all metadata, see the docs.rs documentation.
Library Changelog
See changelog.md
Advanced
Custom Data Types
Small data types can be efficiently compressed in expansion:
for example, compressing u8
data as a sequence of u16
values. The only cost to using a larger datatype is a small
increase in chunk metadata size.
When necessary, you can implement your own data type via
q_compress::types::NumberLike
and (if the existing signed/unsigned
implementations are insufficient)
q_compress::types::SignedLike
and
q_compress::types::UnsignedLike
.
Seeking and Quantile Statistics
Recall that each chunk has a metadata section containing
- the total count of numbers in the chunk,
- the ranges for the chunk and count of numbers in each range,
- and the size in bytes of the compressed body.
Using the compressed body size, it is easy to seek through the whole file and collect a list of all the chunk metadatas. One can aggregate them to obtain the total count of numbers in the whole file and even an approximate histogram. This is typically about 100x faster than decompressing all the numbers.
See the fast seeking example.