Expand description
This crate implements summation algorithms that significantly reduce the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach.
§Algorithms
§Exact addition
two_sum()
and fast_two_sum()
allow to compute the rounded addition of two floating-point numbers and the associated numerical error.
Both functions return a tuple (s, t)
where s
is the floating-point sum rounded to nearest and t
is the floating-point error.
§Compensated summation
KahanBabuska
and KahanBabuskaNeumaier
allow to compute compensated sums using the Kahan-Babuška and Kahan-Babuška-Neumaier algorithms respectively.
Both types are generic over a parameter T: num_traits::float::Float
, which is usually f32
or f64
and can typically be inferred.
They support addition and subtraction (also with assignment) of T
and &T
.
The estimated total sum (of type T
) can be retrieved with a method called total()
.
Both types also implement std::iter::Sum
, which means that iterators of floating-point numbers can be conveniently summed.
§Examples
An empty accumulator for the Kahan-Babuška-Neumaier algorithm can be created with KahanBabuskaNeumaier::new()
;
floating-point numbers can be added to it and also subtracted from it;
finally, the estimated total sum can be retrieved with the KahanBabuskaNeumaier::total()
method.
use compensated_summation::KahanBabuskaNeumaier;
let mut sum = KahanBabuskaNeumaier::new();
sum += 0.1;
sum += 0.2;
sum -= 0.3;
assert_eq!(sum.total(), f64::EPSILON / 8.0);
An iterator of floating-point numbers can be conveniently summed via its Iterator::sum()
method, specifying the desired algorithm.
use compensated_summation::KahanBabuska;
let iter = [0.1; 10].iter();
assert_eq!(iter.sum::<KahanBabuska<_>>().total(), 1.0);
§Caution!
A word of caution: floating-point arithmetic is subtler than one may naively think!
For example, the exact sum of the double precision (f64
) floating-point numbers
0.1
, 0.2
and -0.3
is not zero, not because of some numerical errors in the computation of
arithmetic operations (I’m speaking of their exact mathematical sum, not 0.1 + 0.2 - 0.3
), but
because 0.1
, 0.2
and -0.3
do not represent what you may think. Mathematically, they are not
$1/10$, $2/10=1/5$ and $-3/10$ respectively, because these are not
dyadic rational numbers
hence cannot be represented exactly as f64
floating-point numbers. Instead, 0.1
, 0.2
and -0.3
are
$3602879701896397/36028797018963968$, $3602879701896397/18014398509481984$ and
$-5404319552844595/18014398509481984$ respectively, whose sum is $1/36028797018963968=2^{-55}$, which is equal to f64::EPSILON / 8.0
.
With this in mind, one can realize that in this case Kahan-Babuška-Neumaier computes the correct sum
use compensated_summation::*;
assert_eq!([0.1, 0.2, -0.3].iter().sum::<KahanBabuskaNeumaier<f64>>().total(), f64::EPSILON / 8.0);
whereas both the naive summation and Kahan-Babuška do not
use compensated_summation::*;
assert_eq!([0.1, 0.2, -0.3].iter().sum::<f64>(), f64::EPSILON / 4.0);
assert_eq!([0.1, 0.2, -0.3].iter().sum::<KahanBabuska<f64>>().total(), 0.0);
Don’t be fooled by the Kahan-Babuška result being 0.0
: from the f64
perspective it is incorrect.
Modules§
- dev
dev
- This module is for development purposes only!
Structs§
- Kahan
Babuska - This type is an accumulator for computing a sum with Kahan-Babuška algorithm.
- Kahan
Babuska Neumaier - This type is an accumulator for computing a sum with Kahan-Babuška-Neumaier algorithm.
Functions§
- fast_
two_ sum Fast2Sum
algorithm, see https://en.wikipedia.org/wiki/2Sum.- two_sum
2Sum
algorithm, see https://en.wikipedia.org/wiki/2Sum.
Type Aliases§
- Kahan
Babuška - Same as
KahanBabuska
, but with correct spelling of the second surname. - Kahan
Babuška Neumaier - Same as
KahanBabuskaNeumaier
, but with correct spelling of the second surname.