Expand description
Traits for reading and writing instantaneous codes.
This modules contains code for reading and writing instantaneous codes. Codewords are uniformly indexed from 0 for all codes. For example, the first few words of unary, γ, and δ codes are:
Arg | unary | γ | δ |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 01 | 010 | 0100 |
2 | 001 | 011 | 0101 |
3 | 0001 | 00100 | 01100 |
4 | 00001 | 00101 | 01101 |
5 | 000001 | 00110 | 01110 |
6 | 0000001 | 00111 | 01111 |
7 | 00000001 | 0001000 | 00100000 |
If you need to encode signed integers, please use the ToInt
and
ToNat
traits, which provide a bijection between signed integers and
natural numbers.
Each code is implemented as a pair of traits for reading and writing (e.g.,
GammaReadParam
and GammaWriteParam
). The traits for reading depend
on BitRead
, whereas the traits for writing
depend on BitWrite
. Note that most codes cannot
write the number u64::MAX
because of overflow issues, which could be
avoided with tests, but at the price of a significant performance drop.
The traits ending with Param
make it possible to specify parameters—for
example, whether to use decoding tables. Usually, one would instead pull in
scope non-parametric traits such as GammaRead
and GammaWrite
, for
which defaults are provided using the mechanism described in the params
module.
Note that if you are using decoding tables, you must ensure that the
peek_bits
method of your
BitRead
implementation returns a sufficient
number of bits: if it does not, an assertion will be triggered in test mode,
but behavior will be unpredictable otherwise. This is unfortunately
difficult to check statically. To stay on the safe side, we recommend to use
a an implementation that is able to peek at least at 16 bits.
§Big-endian vs. little-endian
As discussed in the traits module, in general reversing the bits of a big-endian bit stream will not yield a little-endian bit stream containing the same sequence of fixed-width integers. The same is true for codes, albeit the situation is more complex.
The only code that can be safely reversed is the unary code. All other codes contain some value, and that value is written without reversing its bits. Thus, reversing the bits of a big-endian bit stream containing a sequence of instantaneous codes will not yield a little-endian bit stream containing the same sequence of codes (again, with the exception of unary codes). Technically, the codes written for the little-endian case are different than those written for the big-endian case.
For example, the γ code of 4 is 00101
in big-endian order, but it
is 01100
in little-endian order, so that upon reading the unary code for 2
we can read the 01
part without a bit reversal.
The case of minimal binary codes is even more convoluted:
for example, the code with upper bound 7 has codewords 00
, 010
, 011
,
100
, 101
, 110
, and 111
. To decode such a code without peeking at
more bits than necessary, one first reads two bits, and then decides, based
on their value, whether to read a further bit and add it on the right. But
this means that we have to encode 2 as 011
in the big-endian case, and as
101
in the little-endian case, because we need to read the first two bits
to decide whether to read the third one.
In some cases, we resort to completely ad hoc solutions: for example, in the case of the ω code, for the little-endian case instead of reversing the bits written at each recursive call (which in principle would be necessary), we simply rotate them to the left by one position, exposing the most significant bit as first bit. This is sufficient to make the decoding possible, and the rotation is a much faster operation than bit reversal.
§Dispatch
The basic method for accessing code is through traits like GammaRead
and
GammaWrite
. This approach, however, forces a choice of code in the
source. To pass a choice of code dynamically, please have a look at the
dispatch
module.
Re-exports§
pub use gamma::len_gamma;
pub use gamma::len_gamma_param;
pub use gamma::GammaRead;
pub use gamma::GammaReadParam;
pub use gamma::GammaWrite;
pub use gamma::GammaWriteParam;
pub use delta::len_delta;
pub use delta::len_delta_param;
pub use delta::DeltaRead;
pub use delta::DeltaReadParam;
pub use delta::DeltaWrite;
pub use delta::DeltaWriteParam;
pub use omega::len_omega;
pub use omega::OmegaRead;
pub use omega::OmegaWrite;
pub use minimal_binary::len_minimal_binary;
pub use minimal_binary::MinimalBinaryRead;
pub use minimal_binary::MinimalBinaryWrite;
pub use zeta::len_zeta;
pub use zeta::len_zeta_param;
pub use zeta::ZetaRead;
pub use zeta::ZetaReadParam;
pub use zeta::ZetaWrite;
pub use zeta::ZetaWriteParam;
pub use pi::len_pi;
pub use pi::PiRead;
pub use pi::PiWrite;
pub use golomb::len_golomb;
pub use golomb::GolombRead;
pub use golomb::GolombWrite;
pub use rice::len_rice;
pub use rice::RiceRead;
pub use rice::RiceWrite;
pub use exp_golomb::len_exp_golomb;
pub use exp_golomb::ExpGolombRead;
pub use exp_golomb::ExpGolombWrite;
pub use vbyte::*;
Modules§
- delta
- Elias δ code.
- exp_
golomb - Exponential Golomb codes.
- gamma
- Elias γ code.
- golomb
- Golomb codes.
- minimal_
binary - Minimal binary codes.
- omega
- Elias ω code.
- params
- Mechanisms for selecting parameters.
- pi
- Streamlined Apostolico−Drovandi π codes
- rice
- Rice codes.
- vbyte
- Variable-length byte codes.
- zeta
- Boldi−Vigna ζ codes.