1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
//! Serialize your data compactly!
//!
//! This crate provides a serialization framework fundamentally similar to
//! [serde](https://docs.rs/serde) or [bincode](https://docs.rs/bincode), which
//! enables you to derive a trait [`Encode`] and then use this trait to
//! [`encode`] and to ['decode`] your data, but much more compactly than bincode
//! or other formats.
//!
//! # How to use
//!
//! ```
//! #[derive(compactly::Encode, bincode::Encode)]
//! struct Point {
//! x: f64,
//! y: f64,
//! }
//!
//! #[derive(compactly::Encode, bincode::Encode)]
//! struct Shape {
//! corners: Vec<Point>,
//! }
//!
//! let square = Shape { corners: vec![
//! Point { x: 1.0, y: 1.0 },
//! Point { x: 2.0, y: 1.0 },
//! Point { x: 2.0, y: 0.0 },
//! Point { x: 1.0, y: 0.0 },
//! ]};
//!
//! let encoded: Vec<u8> = compactly::encode(&square);
//! let encoded_bincode: Vec<u8> = bincode::encode_to_vec(&square, bincode::config::standard()).unwrap();
//! assert_eq!(encoded.len(), encoded_bincode.len() / 10); // compaclty encoded is less than 10% of bincode
//! ```
//!
//! # Using a stable format
//!
//! If you are encoding your data for temmporary use (e.g. a cache or network
//! transit with the same version of `compactly`), the above works great.
//! However, if you are looking to encode your data persistently across
//! versions, you will want to use `compactly::v1` which will result in a
//! binary-stable format accessible across all future versions of `compactly`.
//! (Or in the future, perhaps you'll want a newer and more compact format.)
//!
//! ## Example
//! ```
//! #[derive(Default, compactly::v1::Encode)]
//! struct Human {
//! first_name: String,
//! last_name: String,
//! ssn: Option<u64>,
//! year_of_birth: u64,
//! }
//! let encoded: Vec<u8> = compactly::v1::encode(&Human::default());
//! ```
//!
//! # Enabling improved encoding strategies
//!
//! In order for `compactly` to optimally compress your data, you can provide
//! hints (an [`EncodingStrategy`]) as to what kind of distribution of values
//! you expect. This will change the format, so you'll want to get this right
//! *before* saving your encoded data into long-term storage.
//!
//! ## Example
//! ```
//! #[derive(Default, compactly::v1::Encode)]
//! struct Human {
//! #[compactly(LowCardinality)]
//! first_name: String,
//! #[compactly(LowCardinality)]
//! last_name: String,
//! ssn: Option<u64>,
//! #[compactly(Small)]
//! year_of_birth: u64,
//! }
//! let encoded: Vec<u8> = compactly::v1::encode(&Human::default());
//! ```
//!
//! ## Encoding strategies
//!
//! |Strategy | Meaning | Effect |
//! |----------|---------|--------|
//! | [Normal] | Default strategy | Encode based on data type alone. |
//! | [Small] | Values are small | Use a var-int encoding, or whatever might be appropriate for "small" data of this type. |
//! | [Decimal]| Numbers may be decimals | Optimize for floating point numbers encoded with limited decimal precision. Any data may be stored compactly, but this will take etra time to check if values could be *more* compactly stored as decimals. |
//! | [LowCardinality] | Low cardinality | There are few values which are frequently repeated, so store each value only once. Be aware that this could double memory use, as it will store a mapping between values and `usize`. |
//! | [Sorted] | Values probably sorted | Assume that the values are likely to arrive in sorted order. Typically this will lead to storing differences between successive values. |
//! | [Compressible] | Expensive compression may be used | Take whatever time is needed to compress this data. For `String` and `Vec<u8>` this enables [LZ77-style compression](https://en.wikipedia.org/wiki/LZ77_and_LZ78) which can be very slow, but also can provide very good compression for natural language data. |
//! | [Values<S>] | Apply strategy to values of a collection | e.g. `Values<Small>` assumes all values in a `Vec` or `HashSet` are small |
//! | [Mapping<K,V>] | Apply strategies to keys and values of a collection | e.g. `Mapping<Sorted,Decimal>` is the `Normal` strategy for a `BTreeMap`, but you might prefer a `Mapping<LowCardinality,Small>` if you will be storing a large collection of these maps with a limited number of keys, and the values are small. |
//!
//! # How does compactly work?
//!
//! This crate encodes data using
//! [adaptive](https://en.wikipedia.org/wiki/Adaptive_coding) [range
//! coding](https://en.wikipedia.org/wiki/Range_coding). Each type that can be
//! encoded (and really each strategy for each type) has a
//! [Context][Encode::Context]. which is a type that holds the model for the
//! distribution of values. As the data is necoded, this model is updated (this
//! is the essence of [adaptive
//! coding](https://en.wikipedia.org/wiki/Adaptive_coding)),
//!
//! At its core, the encoding is done on a bit-by-bit manner, i.e. each type has
//! a fundamental bitwise encoding, and the `Context` stores the probability of
//! each bit being 1 or 0. Most types have a relatively "clever" encoding such
//! that even without adaptive coding (i.e. learning the patterns from your
//! actual data), common values should be encoded in fewer bits.
//!
//! When you derive [`Encode`] for a struct (or enum), compactly will create a
//! new [`Encode::Context`] which stores distinct `Context` values for each
//! field of your struct (or enum), which means that as your data is encoded,
//! compactly will adaptivly learn the distinct patterns of values for each
//! field.
pub use ;
/// A wrapper around a value causing it to be encoded with a particular strategy.
/// The default strategy for encoding data.
///
/// This exists so that code may be written only once that needs to be able to
/// handle any strategy.
;
/// A strategy for encoding values that are small.
///
/// e.g. if there are integers then they should be small integers.
;
/// A strategy for encoding values that are particularly compressible.
///
/// For instance, this will attempt to apply Lz77-like encoding to strings.
;
/// A strategy for encoding values that cannot be compressed.
///
/// Examples would be encrypted or random bytes. In this case, `compactly`
/// abandons any attempt at compression (e.g. variable bitlength) and also
/// adopts a faster algorithm where possible.
///
/// Note: there is a small 2-3 byte overhead (on an entire encoded value) for
/// using `Incompressible` at all. So ideally you would want to use it only
/// when the encoded size is significant (so a couple of extra bytes won't
/// matter) and encoding and decoding speed is important.
;
/// A strategy for encoding values that have been sorted.
;
/// A strategy for encoding values that are often repeated.
///
/// This can be shockingly efficient when there are just a few values for e.g. a
/// string field.
;
/// A strategy for encoding floating point values that have round decimal values.
;
/// Apply the respective strategies to keys and values.
/// Apply this strategy to values held inside.
///
/// This applies to any sort of collection such as a `Vec` or a `HashSet`.