Type Alias Df64

Source

pub type Df64 = Compensated<f64, f64>;

Expand description

Compensated f64 (emulated quad precision) type.

Emulates quadruple precision with a pair of doubles. This roughly doubles the mantissa bits (and thus squares the precision of double). The range is almost the same as double, with a larger area of denormalized numbers. This is also called double-double arithmetic, compensated arithmetic, or Dekker arithmetic.

The rough cost in floating point operations (fl) and relative error as multiples of u² = 1.32e-32 (round-off error or half the machine epsilon) is as follows:

(op)	f64 f64	error	Df64 f64	error	Df64 Df64	error
add_fast	3 fl	0u²	7 fl	2u²	17 fl	3u²
+ -	6 fl	0u²	10 fl	2u²	20 fl	3u²
*	2 fl	0u²	6 fl	2u²	9 fl	4u²
/	3* fl	1u²	7* fl	3u²	28* fl	6u²
reciprocal	3* fl	1u²			19* fl	2.3u²
sqrt	4* fl	2u²			8* fl	4u²

The error bounds are mostly tight analytical bounds (except for divisions).¹ An asterisk indicates the need for one or two double divisions, which are about an order of magnitude more expensive than regular flops on a modern CPU.

The table can be distilled into two rules of thumb: double-double arithmetic roughly doubles the number of significant digits at the cost of a roughly 15x slowdown compared to double arithmetic.

M. Joldes, et al., ACM Trans. Math. Softw. 44, 1-27 (2018) and J.-M. Muller and L. Rideau, ACM Trans. Math. Softw. 48, 1, 9 (2022). The flop count has been reduced by 3 for divisons/reciprocals. In the case of double-double division, the bound is 10u² but largest observed error is 6u². In double by double division, we expect u². We report the largest observed error. ↩

Aliased Type§

pub struct Df64 { /* private fields */ }

Trait Implementations§

Source §

impl CustomNumeric for Df64

Df64 implementation of CustomNumeric

Source §

fn from_f64_unchecked(x: f64) -> Self

Convert from f64 to Self (direct conversion, no Option) Read more

Source §

fn convert_from<U: CustomNumeric + 'static>(value: U) -> Self

Convert from any CustomNumeric type to Self (generic conversion) Read more

Source §

fn to_f64(self) -> f64

Convert to f64

Source §

fn epsilon() -> Self

Get machine epsilon

Source §

fn pi() -> Self

Get high-precision PI constant

Source §

fn max(self, other: Self) -> Self

Maximum of two values (not provided by ComplexField)

Source §

fn min(self, other: Self) -> Self

Minimum of two values (not provided by ComplexField)

Source §

fn is_valid(&self) -> bool

Check if the value is valid (not NaN/infinite)

Source §

fn abs_as_same_type(self) -> Self

Check if the value is finite (not NaN or infinite) Get absolute value as the same type (convenience method) Read more

Source §

fn exp_m1(self) -> Self

Compute exp(self) - 1 with higher precision for small values Read more

Df64

Type Alias Df64 Copy item path

Aliased Type§

Trait Implementations§

impl CustomNumeric for Df64

fn from_f64_unchecked(x: f64) -> Self

fn convert_from<U: CustomNumeric + 'static>(value: U) -> Self

fn to_f64(self) -> f64

fn epsilon() -> Self

fn pi() -> Self

fn max(self, other: Self) -> Self

fn min(self, other: Self) -> Self

fn is_valid(&self) -> bool

fn abs_as_same_type(self) -> Self

fn exp_m1(self) -> Self

Type Alias Df64