Expand description
This crate implements summation algorithms that significantly reduce the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach.
§Algorithms
§Exact addition
two_sum() and fast_two_sum() allow to compute the rounded addition of two floating-point numbers and the associated numerical error.
Both functions return a tuple (s, t) where s is the floating-point sum rounded to nearest and t is the floating-point error.
§Compensated summation
KahanBabuska and KahanBabuskaNeumaier allow to compute compensated sums using the Kahan-Babuška and Kahan-Babuška-Neumaier algorithms respectively.
Both types are generic over a parameter T: num_traits::float::Float, which is usually f32 or f64 and can typically be inferred.
They support addition and subtraction (also with assignment) of T and &T.
The estimated total sum (of type T) can be retrieved with a method called total().
Both types also implement std::iter::Sum, which means that iterators of floating-point numbers can be conveniently summed.
§Examples
An empty accumulator for the Kahan-Babuška-Neumaier algorithm can be created with KahanBabuskaNeumaier::new();
floating-point numbers can be added to it and also subtracted from it;
finally, the estimated total sum can be retrieved with the KahanBabuskaNeumaier::total() method.
use compensated_summation::KahanBabuskaNeumaier;
let mut sum = KahanBabuskaNeumaier::new();
sum += 0.1;
sum += 0.2;
sum -= 0.3;
assert_eq!(sum.total(), f64::EPSILON / 8.0);An iterator of floating-point numbers can be conveniently summed via its Iterator::sum() method, specifying the desired algorithm.
use compensated_summation::KahanBabuska;
let iter = [0.1; 10].iter();
assert_eq!(iter.sum::<KahanBabuska<_>>().total(), 1.0);§Caution!
A word of caution: floating-point arithmetic is subtler than one may naively think!
For example, the exact sum of the double precision (f64) floating-point numbers
0.1, 0.2 and -0.3 is not zero, not because of some numerical errors in the computation of
arithmetic operations (I’m speaking of their exact mathematical sum, not 0.1 + 0.2 - 0.3), but
because 0.1, 0.2 and -0.3 do not represent what you may think. Mathematically, they are not
$1/10$, $2/10=1/5$ and $-3/10$ respectively, because these are not
dyadic rational numbers
hence cannot be represented exactly as f64 floating-point numbers. Instead, 0.1, 0.2 and -0.3 are
$3602879701896397/36028797018963968$, $3602879701896397/18014398509481984$ and
$-5404319552844595/18014398509481984$ respectively, whose sum is $1/36028797018963968=2^{-55}$, which is equal to f64::EPSILON / 8.0.
With this in mind, one can realize that in this case Kahan-Babuška-Neumaier computes the correct sum
use compensated_summation::*;
assert_eq!([0.1, 0.2, -0.3].iter().sum::<KahanBabuskaNeumaier<f64>>().total(), f64::EPSILON / 8.0);whereas both the naive summation and Kahan-Babuška do not
use compensated_summation::*;
assert_eq!([0.1, 0.2, -0.3].iter().sum::<f64>(), f64::EPSILON / 4.0);
assert_eq!([0.1, 0.2, -0.3].iter().sum::<KahanBabuska<f64>>().total(), 0.0);Don’t be fooled by the Kahan-Babuška result being 0.0: from the f64 perspective it is incorrect.
Modules§
- dev
dev - This module is for development purposes only!
Structs§
- Kahan
Babuska - This type is an accumulator for computing a sum with Kahan-Babuška algorithm.
- Kahan
Babuska Neumaier - This type is an accumulator for computing a sum with Kahan-Babuška-Neumaier algorithm.
Functions§
- fast_
two_ sum Fast2Sumalgorithm, see https://en.wikipedia.org/wiki/2Sum.- two_sum
2Sumalgorithm, see https://en.wikipedia.org/wiki/2Sum.
Type Aliases§
- Kahan
Babuška - Same as
KahanBabuska, but with correct spelling of the second surname. - Kahan
Babuška Neumaier - Same as
KahanBabuskaNeumaier, but with correct spelling of the second surname.