dsi_bitstream/codes/
mod.rs

1/*
2 * SPDX-FileCopyrightText: 2023 Tommaso Fontana
3 * SPDX-FileCopyrightText: 2023 Inria
4 * SPDX-FileCopyrightText: 2023 Sebastiano Vigna
5 *
6 * SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
7 */
8
9//! Traits for reading and writing instantaneous codes.
10//!
11//! This modules contains code for reading and writing instantaneous codes.
12//! Codewords are uniformly indexed from 0 for all codes. For example, the first
13//! few words of [unary](crate::traits::BitRead::read_unary), [γ](gamma), and
14//! [δ](delta) codes are:
15//!
16//! | Arg |  unary   |    γ    |     δ    |
17//! |-----|---------:|--------:|---------:|
18//! | 0   |        1 |       1 |        1 |
19//! | 1   |       01 |     010 |     0100 |
20//! | 2   |      001 |     011 |     0101 |
21//! | 3   |     0001 |   00100 |    01100 |
22//! | 4   |    00001 |   00101 |    01101 |
23//! | 5   |   000001 |   00110 |    01110 |
24//! | 6   |  0000001 |   00111 |    01111 |
25//! | 7   | 00000001 | 0001000 | 00100000 |
26//!
27//! If you need to encode signed integers, please use the [`ToInt`] and
28//! [`ToNat`] traits, which provide a bijection between signed integers and
29//! natural numbers.
30//!
31//! Each code is implemented as a pair of traits for reading and writing (e.g.,
32//! [`GammaReadParam`] and [`GammaWriteParam`]). The traits for reading depend
33//! on [`BitRead`](crate::traits::BitRead), whereas the traits for writing
34//! depend on [`BitWrite`](crate::traits::BitWrite). Note that most codes cannot
35//! write the number [`u64::MAX`] because of overflow issues, which could be
36//! avoided with tests, but at the price of a significant performance drop.
37//!
38//! The traits ending with `Param` make it possible to specify parameters—for
39//! example, whether to use decoding tables. Usually, one would instead pull in
40//! scope non-parametric traits such as [`GammaRead`] and [`GammaWrite`], for
41//! which defaults are provided using the mechanism described in the [`params`]
42//! module.
43//!
44//! Note that if you are using decoding tables, you must ensure that the
45//! [`peek_bits`](crate::traits::BitRead::peek_bits) method of your
46//! [`BitRead`](crate::traits::BitRead) implementation returns a sufficient
47//! number of bits: if it does not, an assertion will be triggered in test mode,
48//! but behavior will be unpredictable otherwise. This is unfortunately
49//! difficult to check statically. To stay on the safe side, we recommend to use
50//! a an implementation that is able to peek at least at 16 bits.
51//!
52//! # Big-endian vs. little-endian
53//!
54//! As discussed in the [traits module](crate::traits), in general reversing the
55//! bits of a big-endian bit stream will not yield a little-endian bit stream
56//! containing the same sequence of fixed-width integers. The same is true for
57//! codes, albeit the situation is more complex.
58//!
59//! The only code that can be safely reversed is the unary code. All other codes
60//! contain some value, and that value is written without reversing its bits.
61//! Thus, reversing the bits of a big-endian bit stream containing a sequence of
62//! instantaneous codes will not yield a little-endian bit stream containing the
63//! same sequence of codes (again, with the exception of unary codes).
64//! Technically, the codes written for the little-endian case are different than
65//! those written for the big-endian case.
66//!
67//! For example, the [γ code](gamma) of 4 is `00101` in big-endian order, but it
68//! is `01100` in little-endian order, so that upon reading the unary code for 2
69//! we can read the `01` part without a bit reversal.
70//!
71//! The case of [minimal binary codes](minimal_binary) is even more convoluted:
72//! for example, the code with upper bound 7 has codewords `00`, `010`, `011`,
73//! `100`, `101`, `110`, and `111`. To decode such a code without peeking at
74//! more bits than necessary, one first reads two bits, and then decides, based
75//! on their value, whether to read a further bit and add it on the right. But
76//! this means that we have to encode 2 as `011` in the big-endian case, and as
77//! `101` in the little-endian case, because we need to read the first two bits
78//! to decide whether to read the third one.
79//!
80//! In some cases, we resort to completely *ad hoc* solutions: for example, in
81//! the case of the [ω code](omega), for the little-endian case instead of
82//! reversing the bits written at each recursive call (which in principle would
83//! be necessary), we simply rotate them to the left by one position, exposing
84//! the most significant bit as first bit. This is sufficient to make the
85//! decoding possible, and the rotation is a much faster operation than bit
86//! reversal.
87//!
88//! # Dispatch
89//!
90//! The basic method for accessing code is through traits like [`GammaRead`] and
91//! [`GammaWrite`]. This approach, however, forces a choice of code in the
92//! source. To pass a choice of code dynamically, please have a look at the
93//! [`dispatch`](crate::dispatch) module.
94
95use anyhow::Result;
96use common_traits::{AsBytes, SignedInt, UnsignedInt};
97pub mod params;
98
99pub mod gamma;
100pub use gamma::{
101    len_gamma, len_gamma_param, GammaRead, GammaReadParam, GammaWrite, GammaWriteParam,
102};
103
104pub mod delta;
105pub use delta::{
106    len_delta, len_delta_param, DeltaRead, DeltaReadParam, DeltaWrite, DeltaWriteParam,
107};
108
109pub mod omega;
110pub use omega::{len_omega, OmegaRead, OmegaWrite};
111
112pub mod minimal_binary;
113pub use minimal_binary::{len_minimal_binary, MinimalBinaryRead, MinimalBinaryWrite};
114
115pub mod zeta;
116pub use zeta::{len_zeta, len_zeta_param, ZetaRead, ZetaReadParam, ZetaWrite, ZetaWriteParam};
117
118pub mod pi;
119pub use pi::{len_pi, PiRead, PiWrite};
120
121pub mod golomb;
122pub use golomb::{len_golomb, GolombRead, GolombWrite};
123
124pub mod rice;
125pub use rice::{len_rice, RiceRead, RiceWrite};
126
127pub mod exp_golomb;
128pub use exp_golomb::{len_exp_golomb, ExpGolombRead, ExpGolombWrite};
129
130pub mod vbyte;
131pub use vbyte::*;
132
133pub mod delta_tables;
134pub mod gamma_tables;
135pub mod zeta_tables;
136
137/// Extension trait mapping natural numbers bijectively to integers.
138///
139/// The method [`to_int`](#tymethod.to_int) will map a natural number `x` to `x
140/// / 2` if `x` is even, and to `−(x + 1) / 2` if `x` is odd. The inverse
141/// transformation is provided by the [`ToNat`] trait.
142///
143/// This pair of bijections makes it possible to use instantaneous codes for
144/// signed integers by mapping them to natural numbers and back.
145///
146/// This bijection is best known as the “ZigZag” transformation in Google's
147/// [Protocol Buffers](https://protobuf.dev/), albeit it has been used by
148/// [WebGraph](http://webgraph.di.unimi.it/) since 2003, and much likely in
149/// other software, for the same purpose. Note that the compression standards
150/// H.264/H.265 uses a different transformation for exponential Golomb codes,
151/// mapping a positive integer `x` to `2x − 1` and a zero or negative integer
152/// `x` to `−2x`.
153///
154/// The implementation is just based on the traits [`UnsignedInt`] and
155/// [`AsBytes`]. We provide blanket implementations for all primitive unsigned
156/// integer types, but it can be used with any type implementing those traits.
157pub trait ToInt: UnsignedInt + AsBytes {
158    #[inline]
159    fn to_int(self) -> Self::SignedInt {
160        (self >> Self::ONE).to_signed() ^ (-(self & Self::ONE).to_signed())
161    }
162}
163
164impl ToInt for u128 {}
165impl ToInt for u64 {}
166impl ToInt for u32 {}
167impl ToInt for u16 {}
168impl ToInt for u8 {}
169impl ToInt for usize {}
170
171/// Extension trait mapping signed integers bijectively to natural numbers.
172///
173/// The method [`to_nat`](#tymethod.to_nat) will map an nonnegative integer `x`
174/// to `2x` and a negative integer `x` to `−2x − 1`. The inverse transformation
175/// is provided by the [`ToInt`] trait.
176///
177/// This pair of bijections makes it possible to use instantaneous codes
178/// for signed integers by mapping them to natural numbers and back.
179///
180/// This bijection is best known as the “ZigZag” transformation in Google's
181/// [Protocol Buffers](https://protobuf.dev/), albeit it has been used by
182/// [WebGraph](http://webgraph.di.unimi.it/) since 2003, and much likely in
183/// other software, for the same purpose. Note that the compression standards
184/// H.264/H.265 uses a different transformation for exponential Golomb codes,
185/// mapping a positive integer `x` to `2x − 1` and a zero or negative integer
186/// `x` to `−2x`.
187///
188/// The implementation is just based on the traits [`SignedInt`] and
189/// [`AsBytes`]. We provide blanket implementations for all primitive signed
190/// integer types, but it can be used with any type implementing those traits.
191pub trait ToNat: SignedInt + AsBytes {
192    #[inline]
193    fn to_nat(self) -> Self::UnsignedInt {
194        (self << Self::ONE).to_unsigned() ^ (self >> (Self::BITS - 1)).to_unsigned()
195    }
196}
197
198impl ToNat for i128 {}
199impl ToNat for i64 {}
200impl ToNat for i32 {}
201impl ToNat for i16 {}
202impl ToNat for i8 {}
203impl ToNat for isize {}