fast_posit/posit/
mod.rs

1//! This module and its submodules contain a software implementation of a standard-compliant Posit
2//! floating point type, with arbitrary width and exponent width (up to 128).
3//!
4//! This module is **EXTENSIVELY** documented! If you want to learn more about Posits and an
5//! optimised software implementation thereof (or about floating point implementations in
6//! general!), you might profit from reading carefully through the code :) We assume basic
7//! familiarity with both the Posit format and with two's complement integer arithmetic;
8//! everything else we try to explain.
9//!
10//! If you know nothing about Posits and want to learn more, a good place to start is
11//! <https://posithub.org/docs/Posits4.pdf>. The most up to date standard is at
12//! <https://posithub.org/docs/posit_standard-2.pdf>.
13//!
14//! Some notation used in the comments:
15//!
16//!   - **Leftmost bits/msb**: most-significant bits.
17//!   - **Rightmost bits/lsb**: least-significant bits.
18//!   - **Bit 0, bit 1, .. bit N-1**: numbered least significant to most significant.
19//!   - [a; b[ for a range that is closed (inclusive) on `a` and open (exclusive) on `b`.
20//!
21//! Suggested reading order: [Posit] and [Decoded] types, [basics] and [constants](consts),
22//! [decode] implementation (Posit→Decoded), [encode] implementation (Decoded→Posit),
23//! [elementary arithmetic](ops).
24//!
25//! # Testing
26//!
27//! Extensive tests are included. When possible, we check all inputs exhaustively (e.g.: adding two
28//! 8-bit posits, there are only 256² possible inputs). Whenever this is not feasible (e.g: adding
29//! two 64-bit posits), we make use of the `proptest` crate, which allows us to probabilistically
30//! test inputs from a random distribution.
31//!
32//! Another key ingredient in the testing process is in the `rational` submodule. There, we define
33//! some tools to convert to and from *arbitrary precision* rational numbers. Since any non-NaR
34//! posit is a rational number, we can test any function in this crate involving posits by simply
35//! checking that the output matches the output of the equivalent function using
36//! infinite-precision rationals. For instance: to test addition of two posits, we simply check
37//! that `rational(a) + rational(b) == rational(a + b)`, i.e. adding the posits yields the same
38//! result as adding the arbitrary-precision rationals (modulo rounding, of course).
39
40/// A *posit* floating point number with `N` bits and `ES` exponent bits, using `Int` as its
41/// underlying type.
42///
43/// If `Int` = `iX`, then `N` must be in the range `3 ..= X`, and `ES` must be in the range `0 ..
44/// N`. If this is not the case, a **compile-time** error will be raised.
45///
46/// Type aliases are provided at the [crate root](crate#types) for the posit types defined in
47/// [the standard](https://posithub.org/docs/posit_standard-2.pdf#subsection.3.1).
48///
49/// # Example
50///
51/// ```
52/// # use fast_posit::Posit;
53/// // A 32-bit posit with 2-bit exponent field, represented in a 32-bit machine type.
54/// type Foo = Posit<32, 2, i32>;
55/// // A 6-bit posit with 1-bit exponent field, represented in an 8-bit machine type.
56/// type Bar = Posit<6, 1, i8>;
57/// ```
58///
59/// The standard [`p8`](crate::p8), [`p16`](crate::p16), [`p32`](crate::p32), and
60/// [`p64`](crate::p64) types are simply type aliases for `Posit<X, 2, iX>`.
61///
62/// Note that `Posit` will have the same size (and alignment) as its `Int` parameter, so it's
63/// currently not possible to create e.g. a 4-bit posit that only takes 4 bits in memory.
64///
65/// If the combination of parameters is invalid, the code will not compile.
66///
67/// ```compile_fail
68/// # use fast_posit::Posit;
69/// type Foo = Posit<40, 2, i32>;   // N=40 too large for i32
70/// # let _ = Foo::ONE;
71/// ```
72/// ```compile_fail
73/// # use fast_posit::Posit;
74/// type Bar = Posit<1, 0, i8>;     // N=1 can't be ≤ 2
75/// # let _ = Bar::ONE;
76/// ```
77/// ```compile_fail
78/// # use fast_posit::Posit;
79/// type Baz = Posit<32, 33, i32>;  // ES=33 too big for N=32
80/// # let _ = Baz::ONE + Baz::ONE;
81/// ```
82pub struct Posit<
83  const N: u32,
84  const ES: u32,
85  Int: crate::Int,
86> (Int);
87
88/// In order to perform most nontrivial operations, a `Posit<N, ES, Int>` needs to be *decoded*
89/// into the form `f × 2^e` (with rational fraction `f` and integer exponent `e`), a form that is
90/// amenable for further manipulation.
91///
92/// This is represented as a `Decoded<N, ES, Int>`, a struct that contains two integer fields,
93/// `frac` and `exp`, such that it represents the value
94///
95/// ```text
96/// (frac / FRAC_DENOM) × (2 ^ exp)
97/// ```
98///
99/// where `FRAC_DENOM` is a fixed power of two, equal to `2 ^ (B-2)`, where `B` = `Int::BITS`.
100///
101/// That is to say: this encodes the `f × 2^e` referred above using two integers:
102/// - The integer [`exp`](Self::exp) is the integer `e`, and
103/// - The integer [`frac`](Self::frac) is the rational `f` *with an implicit denominator* of
104///   `1 << (B-2)`.
105///
106/// Another way to think of it is that `frac` is a fixed-point rational number, where the dot is
107/// two places from the left. For instance (for an 8-bit `frac`):
108///
109///   - `0b01_000000` = +1.00
110///   - `0b01_100000` = +1.50
111///   - `0b10_000000` = -2.00
112///   - `0b11_100000` = -0.50
113///
114/// and so on.
115///
116/// The docstrings for [both](Decoded::frac) [fields](Decoded::exp) contain more detail, more
117/// examples, and further considerations about their values.
118///
119/// Extracting these fields from a posit, and converting back to a posit with correct rounding, can
120/// be done **very** efficiently, and indeed those two algorithms lie at the heart of many
121/// operations: [`Posit::decode_regular`] and [`Decoded::encode_regular_round`].
122#[derive(Clone, Copy)]
123#[derive(Eq, PartialEq, Hash)]
124pub struct Decoded<
125  const N: u32,
126  const ES: u32,
127  Int: crate::Int,
128> {
129  /// The `frac`tion is the `frac / FRAC_DENOM` part of the posit value. Since the constant
130  /// `FRAC_DENOM` = `1 << (Int::BITS - 2)` is fixed, one can simply look at the values of `frac`
131  /// as fixed-point numbers where the dot is two places from the left.
132  ///
133  /// Examples (8-bit posit):
134  ///
135  ///   - `0b01_000000`  = +1.0
136  ///   - `0b01_100000`  = +1.5
137  ///   - `0b01_110000`  = +1.75
138  ///   - `0b01_010000`  = +1.25
139  ///   - `0b01_111111`  = +1.984375
140  ///
141  /// and negative numbers
142  ///
143  ///   - `0b10_000000`  = -2.0
144  ///   - `0b10_100000`  = -1.5
145  ///   - `0b10_110000`  = -1.25
146  ///   - `0b10_010000`  = -1.75
147  ///   - `0b10_000001`  = -1.984375
148  ///   - `0b10_111111`  = -1.015625
149  ///
150  /// # Valid ranges
151  ///
152  /// Now, the result of [`Posit::decode_regular`] always has a `frac` lying within the following
153  /// ranges:
154  ///
155  ///   - [+1.0, +2.0[ for positive numbers
156  ///   - [-2.0, -1.0[ for negative numbers
157  ///
158  /// Note that these are not symmetric! In particular, a positive `frac` may be +1.0
159  /// (`0b01_000…`) but not +2.0, and a negative `frac` may be -2.0 (`0b10_000…`) but not -1.0.
160  /// This is an artefact of how the posit format works, and it enables very efficient
161  /// implementations at key points in many algorithms.
162  ///
163  /// In terms of bit patterns, this corresponds to requiring that the `frac` starts with either
164  /// `0b01` (positive) or `0b10` (negative), and never with `0b00` or `0b11`.
165  ///
166  /// Likewise, for the input to [`Decoded::encode_regular`] we **also** require that
167  /// `frac` **must** always be in such a valid range. Whenever this is not the case, we say that
168  /// the `frac` is "*underflowing*". This is checked, together with the bounds for
169  /// [`exp`](Self::exp), by the function [`Self::is_normalised`].
170  ///
171  /// This is important. Often, when we feed a [`Decoded`] to [`Decoded::encode_regular`], such as
172  /// when implementing arithmetic operations, we will need to for instance adjust the `frac` so
173  /// that it is in the correct range, and possibly compensate by adjusting the `exp`onent in the
174  /// opposite direction.
175  pub frac: Int,
176  /// The `exp`onent is the `2 ^ exp` part of the posit value.
177  ///
178  /// The `exp` field is made up from both the "regime" and "exponent" fields of a posit: the
179  /// lowest `ES` bits are the exponent field verbatim, while the highest come from the regime's
180  /// length and sign. The structure is apparent when looking at the `exp` in binary.
181  ///
182  /// Examples (8-bit posit, 2-bit exponent):
183  ///
184  ///   - `0b00001_01` (exp = +5, regime = +1, exponent = +1)
185  ///   - `0b11110_11` (exp = -5, regime = -2, exponent = +3)
186  ///
187  /// # Valid ranges
188  ///
189  /// For reasons that become apparent when implementing [`Decoded::encode_regular`], we will also
190  /// require that `exp >> ES` lies in a certain range, namely between `Int::MIN / 2` and
191  /// `Int::MAX / 2`, inclusive.
192  ///
193  /// In terms of bit patterns, this corresponds to requiring that `exp >> ES` starts with `0b00`
194  /// (positive) or `0b11` (negative), and never with `0b01` or `0b10`. Plainly, this is only an
195  /// issue if `ES` is 0, or the bounds automatically hold.
196  ///
197  /// It may happen that, for posits of very high dynamic range, the maximum exponent would
198  /// overflow an `Int`, and thus not be able to be stored in the `exp` field. Although his is not
199  /// a concern unless `ES` takes absurdly big values, in that case compile-time checks will
200  /// trigger an error, and suggest the user lower the dynamic range or pick a bigger `Int`.
201  pub exp: Int,
202}
203
204/// Some basic constants and functions, such as a check that `N` and `ES` make sense for `Int`, or
205/// functions to convert to/from raw bits.
206mod basics;
207
208/// Manual implementation of `derive`able traits for [`Posit`]. This is because the derive macro
209/// isn't smart enough to figure some transitive bounds in `Int` → `Sealed`, so we would be left
210/// with superfluous bounds on those traits.
211mod traits;
212
213/// Numeric constants: zero, NaR, min, min_positive, etc.
214mod consts;
215
216/// [`Posit`] → [`Decoded`].
217mod decode;
218
219/// [`Decoded`] → [`Posit`], including correct rounding.
220mod encode;
221
222/// Small fns of one posit argument: neg, prior, next, is_positive, etc.
223mod unary;
224
225/// Rounding a posit to integer-valued posit: round_int, floor, ceil.
226mod round_int;
227
228/// The four basic arithmetic operations: +, -, ×, ÷.
229mod ops;
230
231/// Quire (the fixed-point accumulator for error-free sums and dot products)
232pub mod quire;
233
234/// Conversions to and from integers, to and from floats, and between different posit types.
235///
236/// Two sorts of conversions are implemented:
237///   - The conversions prescribed in the posit standard (using `round_from`).
238///   - The "Rusty" conversions (using `from` for unfallible conversions and `try_from` for
239///     fallible ones).
240///
241/// The [convert::RoundFrom] and [convert::RoundInto] traits are defined here.
242pub mod convert;
243
244/// Traits for [`core::fmt::Display`] and [`core::fmt::Debug`].
245mod fmt;
246
247/// Conversions to and from an arbitrary-precision [malachite::rational::Rational], for testing
248/// purposes. This enables us to verify our algorithms by checking that the exact rationals match.
249/// For example:
250///
251///   - rational(p1 + p2) = rational(p1) + rational(p2)
252///   - rational(p1::ONE) = rational(1)
253///   - rational(p1) = rational(decoded(p1))
254#[cfg(test)]
255mod rational;
256
257/// Miscellaneous helpers for testing.
258#[cfg(test)]
259mod test;