Crate compensated_summation

Source
Expand description

This crate implements summation algorithms that significantly reduce the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach.

§Algorithms

§Exact addition

two_sum() and fast_two_sum() allow to compute the rounded addition of two floating-point numbers and the associated numerical error.

Both functions return a tuple (s, t) where s is the floating-point sum rounded to nearest and t is the floating-point error.

§Compensated summation

KahanBabuska and KahanBabuskaNeumaier allow to compute compensated sums using the Kahan-Babuška and Kahan-Babuška-Neumaier algorithms respectively.

Both types are generic over a parameter T: num_traits::float::Float, which is usually f32 or f64 and can typically be inferred.

They support addition and subtraction (also with assignment) of T and &T. The estimated total sum (of type T) can be retrieved with a method called total().

Both types also implement std::iter::Sum, which means that iterators of floating-point numbers can be conveniently summed.

§Examples

An empty accumulator for the Kahan-Babuška-Neumaier algorithm can be created with KahanBabuskaNeumaier::new(); floating-point numbers can be added to it and also subtracted from it; finally, the estimated total sum can be retrieved with the KahanBabuskaNeumaier::total() method.

use compensated_summation::KahanBabuskaNeumaier;
let mut sum = KahanBabuskaNeumaier::new();
sum += 0.1;
sum += 0.2;
sum -= 0.3;
assert_eq!(sum.total(), f64::EPSILON / 8.0);

An iterator of floating-point numbers can be conveniently summed via its Iterator::sum() method, specifying the desired algorithm.

use compensated_summation::KahanBabuska;
let iter = [0.1; 10].iter();
assert_eq!(iter.sum::<KahanBabuska<_>>().total(), 1.0);

§Caution!

A word of caution: floating-point arithmetic is subtler than one may naively think!

For example, the exact sum of the double precision (f64) floating-point numbers 0.1, 0.2 and -0.3 is not zero, not because of some numerical errors in the computation of arithmetic operations (I’m speaking of their exact mathematical sum, not 0.1 + 0.2 - 0.3), but because 0.1, 0.2 and -0.3 do not represent what you may think. Mathematically, they are not $1/10$, $2/10=1/5$ and $-3/10$ respectively, because these are not dyadic rational numbers hence cannot be represented exactly as f64 floating-point numbers. Instead, 0.1, 0.2 and -0.3 are $3602879701896397/36028797018963968$, $3602879701896397/18014398509481984$ and $-5404319552844595/18014398509481984$ respectively, whose sum is $1/36028797018963968=2^{-55}$, which is equal to f64::EPSILON / 8.0.

With this in mind, one can realize that in this case Kahan-Babuška-Neumaier computes the correct sum

use compensated_summation::*;
assert_eq!([0.1, 0.2, -0.3].iter().sum::<KahanBabuskaNeumaier<f64>>().total(), f64::EPSILON / 8.0);

whereas both the naive summation and Kahan-Babuška do not

use compensated_summation::*;
assert_eq!([0.1, 0.2, -0.3].iter().sum::<f64>(), f64::EPSILON / 4.0);
assert_eq!([0.1, 0.2, -0.3].iter().sum::<KahanBabuska<f64>>().total(), 0.0);

Don’t be fooled by the Kahan-Babuška result being 0.0: from the f64 perspective it is incorrect.

Modules§

devdev
This module is for development purposes only!

Structs§

KahanBabuska
This type is an accumulator for computing a sum with Kahan-Babuška algorithm.
KahanBabuskaNeumaier
This type is an accumulator for computing a sum with Kahan-Babuška-Neumaier algorithm.

Functions§

fast_two_sum
Fast2Sum algorithm, see https://en.wikipedia.org/wiki/2Sum.
two_sum
2Sum algorithm, see https://en.wikipedia.org/wiki/2Sum.

Type Aliases§

KahanBabuška
Same as KahanBabuska, but with correct spelling of the second surname.
KahanBabuškaNeumaier
Same as KahanBabuskaNeumaier, but with correct spelling of the second surname.