Df64

Type Alias Df64 

Source
pub type Df64 = Compensated<f64, f64>;
Expand description

Compensated f64 (emulated quad precision) type.

Emulates quadruple precision with a pair of doubles. This roughly doubles the mantissa bits (and thus squares the precision of double). The range is almost the same as double, with a larger area of denormalized numbers. This is also called double-double arithmetic, compensated arithmetic, or Dekker arithmetic.

The rough cost in floating point operations (fl) and relative error as multiples of u² = 1.32e-32 (round-off error or half the machine epsilon) is as follows:

(op)f64 f64errorDf64 f64errorDf64 Df64error
add_fast3 fl0u²7 fl2u²17 fl3u²
+ -6 fl0u²10 fl2u²20 fl3u²
*2 fl0u²6 fl2u²9 fl4u²
/3* fl1u²7* fl3u²28* fl6u²
reciprocal3* fl1u²19* fl2.3u²
sqrt4* fl2u²8* fl4u²

The error bounds are mostly tight analytical bounds (except for divisions).1 An asterisk indicates the need for one or two double divisions, which are about an order of magnitude more expensive than regular flops on a modern CPU.

The table can be distilled into two rules of thumb: double-double arithmetic roughly doubles the number of significant digits at the cost of a roughly 15x slowdown compared to double arithmetic.


  1. M. Joldes, et al., ACM Trans. Math. Softw. 44, 1-27 (2018) and J.-M. Muller and L. Rideau, ACM Trans. Math. Softw. 48, 1, 9 (2022). The flop count has been reduced by 3 for divisons/reciprocals. In the case of double-double division, the bound is 10u² but largest observed error is 6u². In double by double division, we expect u². We report the largest observed error. 

Aliased Type§

pub struct Df64 { /* private fields */ }

Trait Implementations§

Source§

impl CustomNumeric for Df64

Df64 implementation of CustomNumeric

Source§

fn from_f64_unchecked(x: f64) -> Self

Convert from f64 to Self (direct conversion, no Option) Read more
Source§

fn convert_from<U: CustomNumeric + 'static>(value: U) -> Self

Convert from any CustomNumeric type to Self (generic conversion) Read more
Source§

fn to_f64(self) -> f64

Convert to f64
Source§

fn epsilon() -> Self

Get machine epsilon
Source§

fn pi() -> Self

Get high-precision PI constant
Source§

fn max(self, other: Self) -> Self

Maximum of two values (not provided by ComplexField)
Source§

fn min(self, other: Self) -> Self

Minimum of two values (not provided by ComplexField)
Source§

fn is_valid(&self) -> bool

Check if the value is valid (not NaN/infinite)
Source§

fn abs_as_same_type(self) -> Self

Check if the value is finite (not NaN or infinite) Get absolute value as the same type (convenience method) Read more
Source§

fn exp_m1(self) -> Self

Compute exp(self) - 1 with higher precision for small values Read more