# CDFLIB
A pure-Rust port of [CDFLIB], the cumulative distribution function library by
Barry Brown, James Lovato, and Kathy Russell.
The minimum supported Rust version is 1.71.
## What is CDFLIB?
CDFLIB is a small, venerable numerical library dating to the early 1990s that
computes _cumulative distribution functions_ (CDFs) and their inverses for the
standard distributions of frequentist statistics. It is distributed in [Fortran
90] and in [C]/[C++] (machine-translated from the original [Fortran 77]
sources). It covers eleven distributions:
| Β | Binomial |
| χ², noncentral χ² | Negative binomial |
| F (Fisher–Snedecor), noncentral F | Poisson |
| Γ | |
| Normal | |
| Student's _t_ | |
## Goals
The goal of this crate is to provide CDFLIB in pure Rust. The underlying special
functions ([`gamma_inc`], [`beta_inc`], [`error_f`], [`cumnor`], etc.) are
exposed publicly in a [`cdflib::special`] module for users who want the routines
without the distribution wrappers.
The API is designed to be ergonomic and idiomatic for Rust users, with [traits]
representing the common functionality of continuous and discrete distributions,
and comprehensive error handling via the [`thiserror`] crate.
## Non-goals
Expanding or altering the API beyond what CDFLIB offers is explicitly out of
scope. This is a machine-translated port of the Fortran 90 code. Other
libraries, such as [`statrs`], can use the high-precision functions provided by
CDFLIB to build more ergonomic APIs. The only exception are convenience
textbook one-liners for mean, variance, and so on.
## Notation conventions
The crate, like CDFLIB itself, uses several interchangeable names for the
lower- and upper-tail probabilities of a distribution. The synonyms are:
| Pr[*X* ≤ *x*] | [`cdf`] | `cum` | _P_ | lower-tail probability |
| Pr[*X* > *x*] | [`ccdf`] | `ccum` | _Q_ | upper tail (complementary CDF) |
The two are mathematically complementary (_P_ + _Q_ = 1), but the crate computes
them independently rather than deriving one from the other by subtraction. This
is what lets the small tail keep its precision deep into the tails, where `1.0 -
cdf(x)` would lose digits to cancellation.
The incomplete-Γ and incomplete-Β routines follow the same convention:
[`gamma_inc`] returns the pair (_P_, _Q_), [`beta_inc`] returns
(_Iₓ_(_a_, _b_), 1 − _Iₓ_(_a_, _b_)).
## Why CDFLIB?
Many libraries compute CDFs. CDFLIB is distinguished by two design choices:
### 1. Stays accurate in the tails and at large parameter values
The numerical heart of CDFLIB is the pair of regularized incomplete-function
routines [`gamma_inc`] (≈ [ACM Algorithm 654]) and [`beta_inc`] (≈ [ACM Algorithm
708]). Both dispatch across five computational regimes depending on the location
in parameter space (power series, continued fraction, Tricomi–Temme-style asymptotic
expansion, near-integer specialization, and ratio-extreme handling) and they
return both the lower and upper tail probabilities directly, without computing
one from the other.
This is the same algorithm family that underlies SciPy's [incomplete-Γ/Β
routines]. It delivers near-machine precision (13–15 digits) deep into the tails
and at large parameter values, where continued-fraction implementations lose
digits to subtractive cancellation or stall on convergence.
The Rust statistical ecosystem already has [`statrs`], which covers most of
CDFLIB's distributions. However, at the time of this writing [`statrs`] does
not offer parameter searches, [noncentral χ²], or [noncentral _F_], and its
special functions are not as precise as CDFLIB's:
| _P_(10¹³, 10¹³ + 1) | ≈ 0.5 | 0.5000 | 0.4926 |
| _P_(10¹⁵, 10¹⁵ − 1) | ≈ 0.5 | 0.5000 | 0.00645 |
| _Iₓ_(10⁸, 4·10⁸) at _x_ = 0.2 | ≈ 0.5 | 0.500009 | 2.262 |
| _Iₓ_(10¹², 3·10¹²) at _x_ = 0.25 | ≈ 0.5 | 0.500000 | 217.7 |
| Pr[Poisson(10¹⁵) > 10¹⁵ + 2·√10¹⁵] | ≈ 0.0228 | 0.02275 | 3.97·10⁻⁵ |
Note that these calls are not realistic for a statistician: for everyday
usage, [`statrs`] and this crate will give the same results. The last example,
however, was the author's motivation for this port—large-scale collision tests
of pseudorandom number generators are starting to land in that area due
to more core memory being available, and to improved techniques.
[`rmathlib`], a Rust port of R's special-function library, is another option. It
is accurate in the body of each distribution, but its asymptotic regime stops
working for large _a_ in the regularized incomplete Γ—exactly where χ² tests
with many degrees of freedom land. CDFLIB's Tricomi–Temme asymptotic regime (one
of five branches in [`gamma_inc`]) covers this range cleanly:
| (_P_, _Q_)(500, 500) | (0.5059, 0.4941) | (NaN, NaN) |
| (_P_, _Q_)(5000, 5000) | (0.5019, 0.4981) | (NaN, NaN) |
| (_P_, _Q_)(10⁶, 10⁶) | (0.5001, 0.4999) | (NaN, NaN) |
| (_P_, _Q_)(10⁹, 10⁹) | (0.5000, 0.5000) | (NaN, NaN) |
These correspond to χ²(1000), χ²(10⁴), χ²(2·10⁶), and χ²(2·10⁹) at their
respective medians, which arise in goodness-of-fit and likelihood-ratio tests on
large samples.
### 2. Solves for any parameter, not just _x_ and _p_
Given a CDF identity _p_ = _F_(_x_; *θ*₁, *θ*₂, …), most libraries can give you
_p_ from _x_ (the CDF) or _x_ from _p_ (the inverse CDF, also called the
quantile function). CDFLIB can additionally compute for any _θᵢ_ when you know
_p_, _x_, and the other parameters. For example:
- “What standard deviation places probability 0.975 below _x_ = 1.96, given a
mean of 0?”
- “What number of trials puts Pr[*X* ≤ 10] at 0.95 in a Binomial with success
rate 0.3?”
- “What degrees of freedom for a χ² distribution put 95% of the mass below _x_ =
3.84?”
## Examples
### CDFs, complementary CDFs, and inverses
```rust
use cdflib::Normal;
use cdflib::traits::{Continuous, ContinuousCdf, Mean};
let n = Normal::try_new(0.0, 1.0)?;
let p = n.cdf(1.96); // 0.9750021048517796
let sf = n.ccdf(5.0); // 2.866516e-7, computed directly (not 1 - cdf)
let x = n.inverse_cdf(0.975)?; // 1.9599639845400538
let xs = n.inverse_ccdf(1e-12)?; // 7.034484 (accurate deep into the right tail)
let d = n.pdf(0.0); // 0.3989422804014327
let mu = n.mean(); // 0.0
# Ok::<(), cdflib::NormalError>(())
```
### Parameter searches
Given _p_ = _F_(_x_; *θ*₁, *θ*₂, …), you can compute any parameter when the
others are known. Two practical uses:
```rust
use cdflib::{ChiSquared, Poisson};
// Upper 95% confidence bound on λ after observing 3 Poisson events
// (the Garwood / exact-Poisson interval).
let lambda_hi = Poisson::search_lambda(0.05, 0.95, 3)?;
// 7.7537
// Degrees of freedom that put 95% of a χ² distribution below x = 3.84
// (recovers df = 1, the classic likelihood-ratio test critical value).
let df = ChiSquared::search_df(0.95, 0.05, 3.84)?;
// 0.9994
# Ok::<(), Box<dyn std::error::Error>>(())
```
### Power of a noncentral χ² test
```rust
use cdflib::{ChiSquared, ChiSquaredNoncentral};
use cdflib::traits::ContinuousCdf;
// Critical value of a χ²(5) test at α = 0.05.
let crit = ChiSquared::try_new(5.0)?.inverse_cdf(0.95)?;
// 11.0705
// Power against a noncentral alternative with ncp = 10.
let power = ChiSquaredNoncentral::try_new(5.0, 10.0)?.ccdf(crit);
// 0.6774
# Ok::<(), Box<dyn std::error::Error>>(())
```
### Special functions directly
These routines are public for users who want the numerics without a distribution wrapper:
```rust
use cdflib::special::{cumnor, error_f, gamma_inc};
let (p, q) = gamma_inc(2.5, 1.7); // (0.3614, 0.6386) = (P(2.5,1.7), Q(2.5,1.7))
let e = error_f(0.8); // 0.7421
let (phi, sphi) = cumnor(1.96); // (0.9750, 0.0250) = (Φ(1.96), 1 - Φ(1.96))
# let _ = (p, q, e, phi, sphi);
```
Every special function with possible failure modes also has a `try_*` form
that returns a typed error instead of panicking:
```rust
use cdflib::special::{try_gamma_inc, GammaIncError};
let (p, q) = try_gamma_inc(2.5, 1.7)?;
assert!(matches!(try_gamma_inc(-1.0, 1.0), Err(GammaIncError::ANegative(_))));
# let _ = (p, q);
# Ok::<(), GammaIncError>(())
```
### Beautiful names
Rust allows you to rename the functions:
```rust
use cdflib::special::{beta as Β, cumnor as Φ, gamma as Γ, psi as ψ};
let x = Γ(2.5); // Γ(5/2) = (3/2)·√π / 2 ≈ 1.3293
let y = Β(2.0, 3.0); // Β(2, 3) = 1/12
let (p, _) = Φ(1.96); // Φ(1.96) ≈ 0.975
let γ = -ψ(1.0); // ψ(1) = −γ (Euler–Mascheroni)
# let _ = (x, y, p, γ);
```
## Fidelity to CDFLIB
The port is semantically faithful: every algorithmic decision, polynomial
coefficient, branch threshold, and truncation depth matches `cdflib.f90` to the
digit. The intentional structural divergences are:
- There is no silent error returned as a special value, or errors returned as an
integer index. All functions returning errors have a `try_` prefix and return
a `Result` with a documented error type. The error types are designed to be as
specific as possible about the nature of the error.
- All functions with a `try_` prefix have an infallible variant that panics on
errors, and is documented as such.
- The Fortran routine `gamma_user` is exposed under the Rust name [`gamma`]. The
Fortran name encodes a Fortran-2008 workaround (the language added a `gamma`
intrinsic, so the routine had to be renamed to avoid the collision). Rust has
no such conflict, so the routine takes the bare family name, mirroring
[`beta`].
- `error_fc(ind, x)` (which multiplexes plain and exponentially-scaled output
via an integer flag) is split into two Rust functions, [`error_fc`] and
[`error_fc_scaled`]. Same numerics, no flag argument.
- The Fortran `cum*` and `cdf*` method families are folded into the
corresponding distribution module's [`cdf`] / [`ccdf`] / [`inverse_cdf`] /
[`inverse_ccdf`] / `search_*` methods rather than exposed as bare functions.
- `dinvr` and `dzror` (the reverse-communication root finders) live as internal
state machines in `crate::search`. They are not part of the public surface.
- The search setup constants (`abs_step`, `rel_step`, `stp_mul`, `abs_tol`,
`rel_tol`) that the Fortran `cdf*` routines declare locally are centralized in
`src/search/mod.rs`; the one routine that needs a different absolute tolerance
(`cdfchn`) uses an explicit `search_monotone_with_atol` call.
The lower-level CDFLIB-style helpers ([`algdiv`], [`bcorr`], [`gam1`], [`rlog`],
etc.) live in [`cdflib::special::internal`] so the user-facing
[`cdflib::special`] surface stays focused on the routines a statistical user is
likely to call. Each CDFLIB algorithmic routine can be found under its original
name, modulo the renames and splits enumerated above. The machine-constant
utilities (`ipmpar`, `dpmpar`, `exparg`) and the `ftnstop` fatal-error sink are
not ported: the constants live as Rust module-level values, and error reporting
goes through the `try_*`/`Result` pairs described above.
## Testing
Reference values for the test suite are pre-generated from the bundled Fortran 90
sources (`tests/regenerate/`) and committed as CSV fixtures under `tests/data/`.
`cargo test` reads the CSVs directly; CSV fixtures can be regenerated using the
shell scripts in `tests/regenerate/` if desired; you will need a Fortran 90
compiler.
The code has been extensively tested against the original Fortran 90 and C
sources. In the process, we found [serious bugs in `rmathlib`] and a bug in the
[Fortran 90 version of the library] that has remained undetected for 25 years: a
coefficient for the computation of the error function had been transcribed from
the [original Fortran 77 code] with a wrong exponent (the [C]/[C++] version are
unaffected).
[CDFLIB]: https://people.sc.fsu.edu/~jburkardt/cpp_src/cdflib/cdflib.html
[ACM Algorithm 654]: https://dl.acm.org/doi/10.1145/29380.214348
[ACM Algorithm 708]: https://dl.acm.org/doi/10.1145/131766.131776
[`gamma_inc`]: https://docs.rs/cdflib/latest/cdflib/special/fn.gamma_inc.html
[`beta_inc`]: https://docs.rs/cdflib/latest/cdflib/special/fn.beta_inc.html
[`error_f`]: https://docs.rs/cdflib/latest/cdflib/special/fn.error_f.html
[`cumnor`]: https://docs.rs/cdflib/latest/cdflib/special/fn.cumnor.html
[`cdflib::special`]: https://docs.rs/cdflib/latest/cdflib/special/index.html
[`statrs`]: https://crates.io/crates/statrs
[`rmathlib`]: https://crates.io/crates/rmathlib
[`thiserror`]: https://crates.io/crates/thiserror
[incomplete-Γ/Β routines]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gammainc.html
[serious bugs in `rmathlib`]: https://github.com/tla-org/rmathlib/issues/38
[original Fortran 77 code]: https://dl.acm.org/doi/10.1145/131766.131776#supplementary-materials
[traits]: https://docs.rs/cdflib/latest/cdflib/traits/index.html
[`cdflib::special::internal`]: https://docs.rs/cdflib/latest/cdflib/special/internal/index.html
[`gamma`]: https://docs.rs/cdflib/latest/cdflib/special/fn.gamma.html
[`beta`]: https://docs.rs/cdflib/latest/cdflib/special/fn.beta.html
[`error_fc`]: https://docs.rs/cdflib/latest/cdflib/special/fn.error_fc.html
[`error_fc_scaled`]: https://docs.rs/cdflib/latest/cdflib/special/fn.error_fc_scaled.html
[`algdiv`]: https://docs.rs/cdflib/latest/cdflib/special/internal/fn.algdiv.html
[`bcorr`]: https://docs.rs/cdflib/latest/cdflib/special/internal/fn.bcorr.html
[`gam1`]: https://docs.rs/cdflib/latest/cdflib/special/internal/fn.gam1.html
[`rlog`]: https://docs.rs/cdflib/latest/cdflib/special/internal/fn.rlog.html
[noncentral χ²]: https://docs.rs/cdflib/latest/cdflib/struct.ChiSquaredNoncentral.html
[noncentral _F_]: https://docs.rs/cdflib/latest/cdflib/struct.FisherSnedecorNoncentral.html
[Fortran 90 version of the library]: https://people.sc.fsu.edu/~jburkardt/f_src/cdflib/cdflib.html
[original Fortran 77 code]: https://people.sc.fsu.edu/~jburkardt/f77_src/cdflib/cdflib.html
[`cdf`]: https://docs.rs/cdflib/latest/cdflib/traits/trait.ContinuousCdf.html#tymethod.cdf
[`ccdf`]: https://docs.rs/cdflib/latest/cdflib/traits/trait.ContinuousCdf.html#tymethod.ccdf
[`inverse_cdf`]: https://docs.rs/cdflib/latest/cdflib/traits/trait.ContinuousCdf.html#tymethod.inverse_cdf
[`inverse_ccdf`]: https://docs.rs/cdflib/latest/cdflib/traits/trait.ContinuousCdf.html#tymethod.inverse_ccdf
[Fortran 90]: https://people.sc.fsu.edu/~jburkardt/f_src/cdflib/cdflib.html
[Fortran 77]: https://people.sc.fsu.edu/~jburkardt/f77_src/cdflib/cdflib.html
[C]: https://people.sc.fsu.edu/~jburkardt/c_src/cdflib/cdflib.html
[C++]: https://people.sc.fsu.edu/~jburkardt/cpp_src/cdflib/cdflib.html