Expand description
Serialize your data compactly!
This crate provides a serialization framework fundamentally similar to
serde or bincode, which
enables you to derive a trait Encode
and then use this trait to
encode
and to [’decode`] your data, but much more compactly than bincode
or other formats.
§How to use
#[derive(compactly::Encode, bincode::Encode)]
struct Point {
x: f64,
y: f64,
}
#[derive(compactly::Encode, bincode::Encode)]
struct Shape {
corners: Vec<Point>,
}
let square = Shape { corners: vec![
Point { x: 1.0, y: 1.0 },
Point { x: 2.0, y: 1.0 },
Point { x: 2.0, y: 0.0 },
Point { x: 1.0, y: 0.0 },
]};
let encoded: Vec<u8> = compactly::encode(&square);
let encoded_bincode: Vec<u8> = bincode::encode_to_vec(&square, bincode::config::standard()).unwrap();
assert_eq!(encoded.len(), encoded_bincode.len() / 10); // compaclty encoded is less than 10% of bincode
§Using a stable format
If you are encoding your data for temmporary use (e.g. a cache or network
transit with the same version of compactly
), the above works great.
However, if you are looking to encode your data persistently across
versions, you will want to use compactly::v1
which will result in a
binary-stable format accessible across all future versions of compactly
.
(Or in the future, perhaps you’ll want a newer and more compact format.)
§Example
#[derive(Default, compactly::v1::Encode)]
struct Human {
first_name: String,
last_name: String,
ssn: Option<u64>,
year_of_birth: u64,
}
let encoded: Vec<u8> = compactly::v1::encode(&Human::default());
§Enabling improved encoding strategies
In order for compactly
to optimally compress your data, you can provide
hints (an [EncodingStrategy
]) as to what kind of distribution of values
you expect. This will change the format, so you’ll want to get this right
before saving your encoded data into long-term storage.
§Example
#[derive(Default, compactly::v1::Encode)]
struct Human {
#[compactly(LowCardinality)]
first_name: String,
#[compactly(LowCardinality)]
last_name: String,
ssn: Option<u64>,
#[compactly(Small)]
year_of_birth: u64,
}
let encoded: Vec<u8> = compactly::v1::encode(&Human::default());
§Encoding strategies
Strategy | Meaning | Effect |
---|---|---|
Normal | Default strategy | Encode based on data type alone. |
Small | Values are small | Use a var-int encoding, or whatever might be appropriate for “small” data of this type. |
Decimal | Numbers may be decimals | Optimize for floating point numbers encoded with limited decimal precision. Any data may be stored compactly, but this will take etra time to check if values could be more compactly stored as decimals. |
LowCardinality | Low cardinality | There are few values which are frequently repeated, so store each value only once. Be aware that this could double memory use, as it will store a mapping between values and usize . |
Sorted | Values probably sorted | Assume that the values are likely to arrive in sorted order. Typically this will lead to storing differences between successive values. |
Compressible | Expensive compression may be used | Take whatever time is needed to compress this data. For String and Vec<u8> this enables LZ77-style compression which can be very slow, but also can provide very good compression for natural language data. |
Values | Apply strategy to values of a collection | e.g. Values<Small> assumes all values in a Vec or HashSet are small |
Mapping<K,V> | Apply strategies to keys and values of a collection | e.g. Mapping<Sorted,Decimal> is the Normal strategy for a BTreeMap , but you might prefer a Mapping<LowCardinality,Small> if you will be storing a large collection of these maps with a limited number of keys, and the values are small. |
§How does compactly work?
This crate encodes data using adaptive range coding. Each type that can be encoded (and really each strategy for each type) has a Context. which is a type that holds the model for the distribution of values. As the data is necoded, this model is updated (this is the essence of adaptive coding),
At its core, the encoding is done on a bit-by-bit manner, i.e. each type has
a fundamental bitwise encoding, and the Context
stores the probability of
each bit being 1 or 0. Most types have a relatively “clever” encoding such
that even without adaptive coding (i.e. learning the patterns from your
actual data), common values should be encoded in fewer bits.
When you derive Encode
for a struct (or enum), compactly will create a
new Encode::Context
which stores distinct Context
values for each
field of your struct (or enum), which means that as your data is encoded,
compactly will adaptivly learn the distinct patterns of values for each
field.
Re-exports§
Modules§
- v1
- The
v1
format of compactly.
Structs§
- Compressible
- A strategy for encoding values that are particularly compressible.
- Decimal
- A strategy for encoding floating point values that have round decimal values.
- Encoded
- A wrapper around a value causing it to be encoded with a particular strategy.
- LowCardinality
- A strategy for encoding values that are often repeated.
- Mapping
- Apply the respective strategies to keys and values.
- Normal
- The default strategy for encoding data.
- Small
- A strategy for encoding values that are small.
- Sorted
- A strategy for encoding values that have been sorted.
- Values
- Apply this strategy to values held inside.