pco 0.1.0

Good compression for numerical sequences
Documentation

Crates.io

pco

Quick Start

use pco::standalone::{auto_compress, auto_decompress};
use pco::DEFAULT_COMPRESSION_LEVEL;

fn main() {
  // your data
  let mut my_ints = Vec::new();
  for i in 0..100000 {
    my_ints.push(i as i64);
  }

  // Here we let the library choose a configuration with default compression
  // level. If you know about the data you're compressing, you can compress
  // faster by creating a `CompressorConfig`.
  let compressed: Vec<u8> = auto_compress(&my_ints, DEFAULT_COMPRESSION_LEVEL);
  println!("compressed down to {} bytes", compressed.len());

  // decompress
  let recovered = auto_decompress::<i64>(&compressed).expect("failed to decompress");
  println!("got back {} ints from {} to {}", recovered.len(), recovered[0], recovered.last().unwrap());
}

To run something right away, try the benchmarks.

Standalone vs Wrapped

Pco can be used as a standalone format (as in the above example). Good reasons to do so:

  • a quick proof of concept for the compression ratio or performance
  • all you need to do is transmit a long list of numbers

However, the standalone format is quite limited, so it is expected that most use cases will wrap pco in a different format (imagine, say, Parquet). This could unlock things like

  • nullability
  • fast seeking through the file
  • fast filtering
  • a schema or key:value metadata

In either case, docs.rs has more examples and API details.

Important API Notes

  • In some places, pco methods accept a destination (either W: Write or &mut [T: NumberLike]). If pco returns an error, it is possible both the destination and the struct have been modified.
  • Pco will always try to process all numbers, and it will fail if insufficient bytes are available. For instance, during decompression pco will try to fill the entire &mut [T] passed in, returning an insufficient data error if the &[u8] passed in is not long enough.

Advanced

Custom Data Types

Small data types can be efficiently compressed in expansion: for example, compressing u16 data as a sequence of u32 values. The only cost to using a larger datatype is a very small increase in chunk metadata size.

When necessary, you can implement your own data type via pco::data_types::NumberLike and (if the existing implementations are insufficient) pco::data_types::UnsignedLike and pco::data_types::FloatLike.

The maximum legal precision of a custom data type is currently 128 bits.