fast-posit 0.1.5

Software implementation of the Posit floating point format
Documentation

fast-posit

Software implementation of Posit arithmetic in Rust. Correct, clean, and 🚀 fast.

Introduction

Posits are an alternative floating point format proposed by John Gustafson in 2017, with the first published standard in 2022. They have several interesting features that make them an excellent replacement for traditional IEEE754 floats, in domains such as neural networks or HPC.

Some highlights of the Posit format:

  • Generally higher accuracy and/or dynamic range for the same number of bits, as compared to IEEE floats. Posits have a smaller decimal error for the majority of operations (+, ×, sqrt, etc) compared to a IEEE float of the same size.
  • Simple and deterministic rounding, with bounded errors, and no infinite loss of precision via under- and over-flows, in any circumstance. Unlike IEEE floats, all operations are deterministic, portable, and fully reproducible across systems.
  • Tapered accuracy, elegantly allocating more bits to the "mantissa" for values close to ±1, and gradually decreasing the precision as the absolute value of the exponent increases.
  • The ability to calculate sums and dot products very fast and with NO intermediate rounding whatsoever, up to at least 1 billion terms, even with parallelisation!
  • Flexibility to choose any bit width ≥ 2 and any exponent width ≤ bit width, tailored to the parameters of your application: accuracy, dynamic range, memory constraints, etc.
  • No signed zero, no quadrillions of NaNs, no subnormals, no redundant bit patterns, no exceptions. Just one 0, one NaN, and regular numbers. This is not only simpler to reason about and debug, but also unlocks faster software implementations and less power-hungry hardware implementations.
  • Many other niceties: standard-mandated elementary functions with correct rounding, first-class support for mixed-precision, a blazing fast sigmoid for ML, etc.

Posits are pretty cool, you should read about them here or here or here.

This crate has the following objectives, in order of importance:

  • Correctness: all functions are correct as defined in the standard (i.e. they give the correct results, for all inputs, with the correct rounding). This is verified by extensive built-in tests that check all operations against an oracle (which uses arbitrary precision arithmetic to calculate the correct unrounded result), exhaustively where possible, and probabilistically where we cannot enumerate all inputs.
  • Performance: this library is to the best of our knowledge faster than, or at least as fast as, any freely available software implementation of posits. We include benchmarks that check our implementation against various external ones; see below for how to run them.
  • Readability: Nonwithstanding a fast implementation being quite byzantine and difficult to understand at first glance, the code is well structured and extensively documented. If you are interested in learning more about posits, or about software implementation of floating point formats in general, you may benefit from reading through this code!

This crate aims to implement the entire posit standard and beyond, including features such as arbitrary posit and quire sizes beyond those prescribed by the standard. Versions prior to 1.0.0, however, may be incomplete; see the feature list below.

Usage

The following is an extended tour over the main functionality of the crate, sort of in the style of "learn X in Y minutes". For more information, refer to the documentation of specific types and functions. Whenever a function corresponds to a function in the standard, this will be noted in the item's documentation.

Use standard posit types, or define your own.

use fast_posit::{p8, p16, p32, p64};  // Standard: n bits, 2 exponent bits
type MyPosit = Posit<24, 3, i32>;  // Non-standard: 24 bits, 3 exponent bits

Create posits from ints, IEEE floats, strings, constants, or a raw bit representation.

let a = p32::round_from(2.71_f64);
let b = p32::round_from(42_i32);
let c = p32::from_bits(0x7f001337);
let d = p32::MIN_POSITIVE;

Perform basic arithmetic and comparisons using the usual operators.

assert!(p16::round_from(2.14) + p16::ONE == p16::round_from(3.14));
assert!(p16::MIN_POSITIVE < 1e-15.round_into());
assert!(p16::round_from(-1.1).floor() == p16::round_from(-2));

Convert posits back to ints, IEEE floats, strings, or a raw bit representation.

assert_eq!(p8::ONE.to_bits(), 0b01000000);
assert_eq!(p16::round_from(3.5).round_into(), 4_i32);
assert_eq!(p16::MIN.round_into(), -f32::exp2(56.));

Use a quire to calculate sums and dot products without loss of precision!

use fast_posit::{q8, q16, q32, q64};
let mut quire = q16::ZERO;
quire += p16::MAX;
quire += p16::round_from(0.1);
quire -= p16::MAX;
let result: p16 = quire.round_into();
// Correct result with the quire, no issues with rounding errors.
assert_eq!(result, p16::round_from(0.1));
// The same sum without the quire would give a wrong result, due to double rounding.
let posit = p16::MAX + p16::round_from(0.1) - p16::MAX;
assert_eq!(posit, p16::ZERO);

Dot products in the quire can give correct results with 32 bits even where IEEE floats fail with 64 bits.

let a = [3.2e7, 1., -1.,  8.0e7];
let b = [4.0e8, 1., -1., -1.6e8];
// Calculating the dot product with 64-bit IEEE floats yields an incorrect result.
let float: f64 = a.iter().zip(b.iter())
  .map(|(x, y)| x * y)
  .sum();
assert_eq!(float, 0.);
// Calculating the dot product with 32-bit posits and a quire yields the correct result.
let posit: p32 = a.iter().zip(b.iter())
  .map(|(x, y)| (p32::round_from(*x), p32::round_from(*y)))
  .fold(q32::ZERO, |mut q, (x, y)| { q.add_prod(x, y); q })
  .round_into();
assert_eq!(posit, 2.round_into());

Use a quire per thread to ensure the result is the same regardless of parallelisation!

let mut quires = [q16::ZERO; 8];
for thread in 0..8 {
  let local_quire = &mut quires[thread];
  *local_quire += p16::round_from(123);
  *local_quire += p16::round_from(456);
  // ...
}
// Assemble the final result by summing the thread-local quires first, then converting to posit.
let [mut first, rest @ ..] = quires;
for i in rest {
  first += &i
}
let result: p16 = first.round_into();
assert_eq!(result, p16::round_from(8 * (123 + 456)));

Use mixed-precision with no hassle; it's especially cheap when the ES is the same, such as among the standard types.

let terms = [3, 7, 15, 1].map(p8::round_from);  // https://oeis.org/A001203
let pi = {
  let mut partial = p64::ZERO;
  for i in terms[1..].iter().rev() {
    partial = p64::ONE / (i.convert() + partial)  // `i` upcasted p8→p64 essentially for free
  }
  terms[0].convert() + partial
};
assert!((3.141592.round_into() .. 3.141593.round_into()).contains(&pi));

Performance

In terms of performance, you can expect for example, as a very rough estimate, 80 to 120 Mops/s for basic arithmetic operations (corresponding to about an 8–15× slowdown relative to native hw FPU operations) on an 11th gen Intel x86 core at 3.80GHz.

Benchmark results for a addition, multiplication, and division, comparing this library, cerlane-softposit, and berkeley-softfloat.

"fast-posit" is this crate, "cerlane-softposit" is a mature library for posit arithmetic written in C, and "berkeley-softfloat" is a very mature library for IEEE float arithmetic written in C, considered the gold standard for that number system.

Needless to say, both absolute performance and relative performance vs the FPU will vary depending on your system. See below for how to run benchmarks.

Testing

Run tests with cargo test.

The test suite is very comprehensive. Testing is exhaustive where feasible, and probabilistic where not. Emphasis is put on the standard types, but also on various other combinations of parameters. Since the implementations are generic, this gives a high degree of confidence that all combinations of parameters are bug-free.

Benchmarks

Run benchmarks with cargo bench -F bench.

Benchmarks which test against external implementations need to be enabled with feature flags, e.g. cargo bench -F bench,cerlane-softposit,berkeley-softfloat.

Mind that the relevant C library needs to be available to the linker; if they are not in the standard paths you may need to set RUSTFLAGS="-L /path/to/lib".

Features

  • Posits with arbitrary size and arbitrary exponent size
  • Basics
    • Arithmetic (+, -, ×, ÷)
    • Comparisons (>, ==, …)
    • Rounding to integer (floor, ceil, …)
  • Elementary functions
    • Exponentials and logarithms (exp, log, exp2, logPlus1, …)
    • Trigonometry (sin, cos, asin, …)
    • Hyperbolics (sinh, cosh, asinh, …)
    • Exponentiation (sqrt, pow, hypot, …)
  • Conversions
    • To integers
    • From integers
    • To floats
    • From floats
    • Between posits
  • Parsing and printing
  • Quire
    • Loading/storing
    • Adding posits
    • Adding products of posits
    • Adding quires

Dependencies

This crate has no (non-dev) dependencies, and can be used in no_std contexts.