fast_posit/posit/
mod.rs

1//! This module and its submodules contain a software implementation of a standard-compliant Posit
2//! floating point type, with arbitrary width and exponent width (up to 128).
3//!
4//! This module is **EXTENSIVELY** documented! If you want to learn more about Posits and an
5//! optimised software implementation thereof (or about floating point implementations in
6//! general!), you might profit from reading carefully through the code :) We assume basic
7//! familiarity with both the Posit format and with two's complement integer arithmetic;
8//! everything else we try to explain.
9//!
10//! If you know nothing about Posits and want to learn more, a good place to start is
11//! <https://posithub.org/docs/Posits4.pdf>. The most up to date standard is at
12//! <https://posithub.org/docs/posit_standard-2.pdf>.
13//!
14//! Some notation used in the comments:
15//!
16//!   - **Leftmost bits/msb**: most-significant bits.
17//!   - **Rightmost bits/lsb**: least-significant bits.
18//!   - **Bit 0, bit 1, .. bit N-1**: numbered least significant to most significant.
19//!   - [a; b[ for a range that is closed (inclusive) on `a` and open (exclusive) on `b`.
20//!
21//! Suggested reading order: [Posit] and [Decoded] types, [basics] and [constants](consts),
22//! [decode] implementation (Posit→Decoded), [encode] implementation (Decoded→Posit),
23//! [elementary arithmetic](ops).
24//!
25//! # Testing
26//!
27//! Extensive tests are included. When possible, we check all inputs exhaustively (e.g.: adding two
28//! 8-bit posits, there are only 256² possible inputs). Whenever this is not feasible (e.g: adding
29//! two 64-bit posits), we make use of the `proptest` crate, which allows us to probabilistically
30//! test inputs from a random distribution.
31//!
32//! Another key ingredient in the testing process is in the `rational` submodule. There, we define
33//! some tools to convert to and from *arbitrary precision* rational numbers. Since any non-NaR
34//! posit is a rational number, we can test any function in this crate involving posits by simply
35//! checking that the output matches the output of the equivalent function using
36//! infinite-precision rationals. For instance: to test addition of two posits, we simply check
37//! that `rational(a) + rational(b) == rational(a + b)`, i.e. adding the posits yields the same
38//! result as adding the arbitrary-precision rationals (modulo rounding, of course).
39
40/// A *posit* floating point number with `N` bits and `ES` exponent bits, optionally with a
41/// maximum regime size `RS` (making it a *b-posit*).
42///
43// A posit is an alternative floating point format first proposed in 2017
44//
45/// This is a generic type which can be instantiated with any valid combination of parameters
46/// (see below). It implements the functions described in the [2022 standard] (and more), either
47/// via the usual arithmetic operations (addition, negation, comparison, etc) or via named
48/// functions. Rounding conversions to and from other types (integers, IEEE floats, etc) are done
49/// via the [`RoundFrom`](crate::RoundFrom) and [`RoundInto`](crate::RoundInto) traits, which need
50/// to be in scope.
51///
52/// Refer to the documentation for specific items for more detail, examples, and links to the
53/// definitions in the [2022 standard], if applicable.
54///
55/// If you are looking for [the types prescribed in the 2022 standard], see [`p8`](crate::p8),
56/// [`p16`](crate::p16), [`p32`](crate::p32), and [`p64`](crate::p64), which are simply type
57/// aliases for [`Posit<X, 2, iX>`].
58///
59/// [the types prescribed in the 2022 standard]: https://posithub.org/docs/posit_standard-2.pdf#subsection.3.1
60///
61/// # Parameters
62///
63/// The `N` and `ES` parameters are mandatory, and the `RS` parameter is optional. The parameter
64/// `Int` is the underlying physical storage used for the posit, which must be one of the
65/// primitive types `i8` to `i128`.
66///
67/// If `Int` = `iX`, then
68///
69///   - `N`, the number of bits, must be in the range `3 ..= X`,
70///   - `ES`, the number of exponent bits, must be in the range `0 .. N`.
71///   - `RS`, the maximum number of regime bits, if supplied, must be in the range `1 ..= N`,
72///     otherwise it will default to `N`(unbounded).
73///
74/// If any parameter is invalid, a **compile-time** error will be raised.
75///
76/// # Example
77///
78/// ```
79/// # use fast_posit::Posit;
80/// // A 32-bit posit with 2-bit exponent field, represented in a 32-bit machine type.
81/// type Foo = Posit<32, 2, i32>;
82/// // A 6-bit posit with 1-bit exponent field, represented in an 8-bit machine type.
83/// type Bar = Posit<6, 1, i8>;
84/// // A 32-bit b-posit with 3-bit exponent field and at most a 6-bit regime field, represented in a 32-bit machine type.
85/// type Baz = Posit<32, 3, i32, 6>;
86/// ```
87///
88/// Note that `Posit` will have the same size (and alignment) as its `Int` parameter, so it's
89/// currently not possible to create e.g. a 4-bit posit that only actually takes 4 bits in memory.
90///
91/// These are examples of invalid parameters which will not compile.
92///
93/// ```compile_fail
94/// # use fast_posit::Posit;
95/// type Foo = Posit<40, 2, i32>;   // N=40 too big for i32
96/// # let _ = Foo::ONE;
97/// ```
98/// ```compile_fail
99/// # use fast_posit::Posit;
100/// type Bar = Posit<1, 0, i8>;     // N=1 can't be ≤ 2
101/// # let _ = Bar::ONE;
102/// ```
103/// ```compile_fail
104/// # use fast_posit::Posit;
105/// type Baz = Posit<32, 33, i32>;  // ES=33 too big for N=32
106/// # let _ = Baz::ONE + Baz::ONE;
107/// ```
108/// ```compile_fail
109/// # use fast_posit::Posit;
110/// type Qux = Posit<32, 2, 33, i32>;  // RS=33 too big for N=32
111/// # let _ = Qux::ONE + Qux::ONE;
112/// ```
113///
114/// [2022 standard]: https://posithub.org/docs/posit_standard-2.pdf
115pub struct Posit<
116  const N: u32,
117  const ES: u32,
118  Int: crate::Int,
119  const RS: u32 = N,
120> (Int);
121
122/// In order to perform most nontrivial operations, a `Posit<N, ES, Int>` needs to be *decoded*
123/// into the form `f × 2^e` (with rational fraction `f` and integer exponent `e`), a form that is
124/// amenable for further manipulation.
125///
126/// This is represented as a `Decoded<N, ES, Int>`, a struct that contains two integer fields,
127/// `frac` and `exp`, such that it represents the value
128///
129/// ```text
130/// (frac / FRAC_DENOM) × (2 ^ exp)
131/// ```
132///
133/// where `FRAC_DENOM` is a fixed power of two, equal to `2 ^ (B-2)`, where `B` = `Int::BITS`.
134///
135/// That is to say: this encodes the `f × 2^e` referred above using two integers:
136/// - The integer [`exp`](Self::exp) is the integer `e`, and
137/// - The integer [`frac`](Self::frac) is the rational `f` *with an implicit denominator* of
138///   `1 << (B-2)`.
139///
140/// Another way to think of it is that `frac` is a fixed-point rational number, where the dot is
141/// two places from the left. For instance (for an 8-bit `frac`):
142///
143///   - `0b01_000000` = +1.00
144///   - `0b01_100000` = +1.50
145///   - `0b10_000000` = -2.00
146///   - `0b11_100000` = -0.50
147///
148/// and so on.
149///
150/// The docstrings for [both](Decoded::frac) [fields](Decoded::exp) contain more detail, more
151/// examples, and further considerations about their values.
152///
153/// Extracting these fields from a posit, and converting back to a posit with correct rounding, can
154/// be done **very** efficiently, and indeed those two algorithms lie at the heart of many
155/// operations: [`Posit::decode_regular`] and [`Decoded::encode_regular_round`].
156#[derive(Clone, Copy)]
157#[derive(Eq, PartialEq, Hash)]
158pub struct Decoded<
159  const N: u32,
160  const ES: u32,
161  const RS: u32,
162  Int: crate::Int,
163> {
164  /// The `frac`tion is the `frac / FRAC_DENOM` part of the posit value. Since the constant
165  /// `FRAC_DENOM` = `1 << (Int::BITS - 2)` is fixed, one can simply look at the values of `frac`
166  /// as fixed-point numbers where the dot is two places from the left.
167  ///
168  /// Examples (8-bit posit):
169  ///
170  ///   - `0b01_000000`  = +1.0
171  ///   - `0b01_100000`  = +1.5
172  ///   - `0b01_110000`  = +1.75
173  ///   - `0b01_010000`  = +1.25
174  ///   - `0b01_111111`  = +1.984375
175  ///
176  /// and negative numbers
177  ///
178  ///   - `0b10_000000`  = -2.0
179  ///   - `0b10_100000`  = -1.5
180  ///   - `0b10_110000`  = -1.25
181  ///   - `0b10_010000`  = -1.75
182  ///   - `0b10_000001`  = -1.984375
183  ///   - `0b10_111111`  = -1.015625
184  ///
185  /// # Valid ranges
186  ///
187  /// Now, the result of [`Posit::decode_regular`] always has a `frac` lying within the following
188  /// ranges:
189  ///
190  ///   - [+1.0, +2.0[ for positive numbers
191  ///   - [-2.0, -1.0[ for negative numbers
192  ///
193  /// Note that these are not symmetric! In particular, a positive `frac` may be +1.0
194  /// (`0b01_000…`) but not +2.0, and a negative `frac` may be -2.0 (`0b10_000…`) but not -1.0.
195  /// This is an artefact of how the posit format works, and it enables very efficient
196  /// implementations at key points in many algorithms.
197  ///
198  /// In terms of bit patterns, this corresponds to requiring that the `frac` starts with either
199  /// `0b01` (positive) or `0b10` (negative), and never with `0b00` or `0b11`.
200  ///
201  /// Likewise, for the input to [`Decoded::encode_regular`] we **also** require that
202  /// `frac` **must** always be in such a valid range. Whenever this is not the case, we say that
203  /// the `frac` is "*underflowing*". This is checked, together with the bounds for
204  /// [`exp`](Self::exp), by the function [`Self::is_normalised`].
205  ///
206  /// This is important. Often, when we feed a [`Decoded`] to [`Decoded::encode_regular`], such as
207  /// when implementing arithmetic operations, we will need to for instance adjust the `frac` so
208  /// that it is in the correct range, and possibly compensate by adjusting the `exp`onent in the
209  /// opposite direction.
210  pub frac: Int,
211  /// The `exp`onent is the `2 ^ exp` part of the posit value.
212  ///
213  /// The `exp` field is made up from both the "regime" and "exponent" fields of a posit: the
214  /// lowest `ES` bits are the exponent field verbatim, while the highest come from the regime's
215  /// length and sign. The structure is apparent when looking at the `exp` in binary.
216  ///
217  /// Examples (8-bit posit, 2-bit exponent):
218  ///
219  ///   - `0b00001_01` (exp = +5, regime = +1, exponent = +1)
220  ///   - `0b11110_11` (exp = -5, regime = -2, exponent = +3)
221  ///
222  /// # Valid ranges
223  ///
224  /// For reasons that become apparent when implementing [`Decoded::encode_regular`], we will also
225  /// require that `exp >> ES` lies in a certain range, namely between `Int::MIN / 2` and
226  /// `Int::MAX / 2`, inclusive.
227  ///
228  /// In terms of bit patterns, this corresponds to requiring that `exp >> ES` starts with `0b00`
229  /// (positive) or `0b11` (negative), and never with `0b01` or `0b10`. Plainly, this is only an
230  /// issue if `ES` is 0, or the bounds automatically hold.
231  ///
232  /// It may happen that, for posits of very high dynamic range, the maximum exponent would
233  /// overflow an `Int`, and thus not be able to be stored in the `exp` field. Although his is not
234  /// a concern unless `ES` takes absurdly big values, in that case compile-time checks will
235  /// trigger an error, and suggest the user lower the dynamic range or pick a bigger `Int`.
236  pub exp: Int,
237}
238
239/// Some basic constants and functions, such as a check that `N` and `ES` make sense for `Int`, or
240/// functions to convert to/from raw bits.
241mod basics;
242
243/// Manual implementation of `derive`able traits for [`Posit`]. This is because the derive macro
244/// isn't smart enough to figure some transitive bounds in `Int` → `Sealed`, so we would be left
245/// with superfluous bounds on those traits.
246mod traits;
247
248/// Numeric constants: zero, NaR, min, min_positive, etc.
249mod consts;
250
251/// [`Posit`] → [`Decoded`].
252mod decode;
253
254/// [`Decoded`] → [`Posit`], including correct rounding.
255mod encode;
256
257/// Small fns of one posit argument: neg, prior, next, is_positive, etc.
258mod unary;
259
260/// Rounding a posit to integer-valued posit: round_int, floor, ceil.
261mod round_int;
262
263/// The four basic arithmetic operations: +, -, ×, ÷.
264mod ops;
265
266/// Other mathematical operations: sqrt, log, exp, sin, etc.
267mod math;
268
269/// Quire (the fixed-point accumulator for error-free sums and dot products)
270pub mod quire;
271
272/// Conversions to and from integers, to and from floats, and between different posit types.
273///
274/// Two sorts of conversions are implemented:
275///   - The conversions prescribed in the posit standard (using `round_from`).
276///   - The "Rusty" conversions (using `from` for unfallible conversions and `try_from` for
277///     fallible ones).
278///
279/// The [convert::RoundFrom] and [convert::RoundInto] traits are defined here.
280pub mod convert;
281
282/// Traits for [`core::fmt::Display`] and [`core::fmt::Debug`].
283mod fmt;
284
285/// Conversions to and from an arbitrary-precision [malachite::rational::Rational], for testing
286/// purposes. This enables us to verify our algorithms by checking that the exact rationals match.
287/// For example:
288///
289///   - rational(p1 + p2) = rational(p1) + rational(p2)
290///   - rational(p1::ONE) = rational(1)
291///   - rational(p1) = rational(decoded(p1))
292#[cfg(test)]
293mod rational;
294
295/// Miscellaneous helpers for testing.
296#[cfg(test)]
297mod test;
fast_posit/posit/mod.rs

fast_posit/posit/
mod.rs