CDFLIB
A pure-Rust port of CDFLIB, the cumulative distribution function library by Barry Brown, James Lovato, and Kathy Russell.
The minimum supported Rust version is 1.71.
What is CDFLIB?
CDFLIB is a small, venerable numerical library dating to the early 1990s that computes cumulative distribution functions (CDFs) and their inverses for the standard distributions of frequentist statistics. It is distributed in Fortran 90 and in C/C++ (machine-translated from the original Fortran 77 sources). It covers eleven distributions:
| Continuous | Discrete |
|---|---|
| Β | Binomial |
| χ², noncentral χ² | Negative binomial |
| F (Fisher–Snedecor), noncentral F | Poisson |
| Γ | |
| Normal | |
| Student's t |
Goals
The goal of this crate is to provide CDFLIB in pure Rust. The underlying special
functions (gamma_inc, beta_inc, error_f, cumnor, etc.) are
exposed publicly in a cdflib::special module for users who want the routines
without the distribution wrappers.
The API is designed to be ergonomic and idiomatic for Rust users, with traits
representing the common functionality of continuous and discrete distributions,
and comprehensive error handling via the thiserror crate.
Non-goals
Expanding or altering the API beyond what CDFLIB offers is explicitly out of
scope. This is a machine-translated port of the Fortran 90 code. Other
libraries, such as statrs, can use the high-precision functions provided by
CDFLIB to build more ergonomic APIs. The only exception are convenience
textbook one-liners for mean, variance, and so on.
Notation conventions
The crate, like CDFLIB itself, uses several interchangeable names for the lower- and upper-tail probabilities of a distribution. The synonyms are:
| Concept | Method | F90 name | CDFLIB code | Other names |
|---|---|---|---|---|
| Pr[X ≤ x] | cdf |
cum |
P | lower-tail probability |
| Pr[X > x] | ccdf |
ccum |
Q | upper tail (complementary CDF) |
The two are mathematically complementary (P + Q = 1), but the crate computes
them independently rather than deriving one from the other by subtraction. This
is what lets the small tail keep its precision deep into the tails, where 1.0 - cdf(x) would lose digits to cancellation.
The incomplete-Γ and incomplete-Β routines follow the same convention:
gamma_inc returns the pair (P, Q), beta_inc returns
(Iₓ(a, b), 1 − Iₓ(a, b)).
Why CDFLIB?
Many libraries compute CDFs. CDFLIB is distinguished by two design choices:
1. Stays accurate in the tails and at large parameter values
The numerical heart of CDFLIB is the pair of regularized incomplete-function
routines gamma_inc (≈ ACM Algorithm 654) and beta_inc (≈ ACM Algorithm
708). Both dispatch across five computational regimes depending on the location
in parameter space (power series, continued fraction, Tricomi–Temme-style asymptotic
expansion, near-integer specialization, and ratio-extreme handling) and they
return both the lower and upper tail probabilities directly, without computing
one from the other.
This is the same algorithm family that underlies SciPy's incomplete-Γ/Β routines. It delivers near-machine precision (13–15 digits) deep into the tails and at large parameter values, where continued-fraction implementations lose digits to subtractive cancellation or stall on convergence.
The Rust statistical ecosystem already has statrs, which covers most of
CDFLIB's distributions. However, at the time of this writing statrs does
not offer parameter searches, noncentral χ², or noncentral F, and its
special functions are not as precise as CDFLIB's:
| True value | CDFLIB | statrs |
|
|---|---|---|---|
| P(10¹³, 10¹³ + 1) | ≈ 0.5 | 0.5000 | 0.4926 |
| P(10¹⁵, 10¹⁵ − 1) | ≈ 0.5 | 0.5000 | 0.00645 |
| Iₓ(10⁸, 4·10⁸) at x = 0.2 | ≈ 0.5 | 0.500009 | 2.262 |
| Iₓ(10¹², 3·10¹²) at x = 0.25 | ≈ 0.5 | 0.500000 | 217.7 |
| Pr[Poisson(10¹⁵) > 10¹⁵ + 2·√10¹⁵] | ≈ 0.0228 | 0.02275 | 3.97·10⁻⁵ |
Note that these calls are not realistic for a statistician: for everyday
usage, statrs and this crate will give the same results. The last example,
however, was the author's motivation for this port—large-scale collision tests
of pseudorandom number generators are starting to land in that area due
to more core memory being available, and to improved techniques.
rmathlib, a Rust port of R's special-function library, is another option. It
is accurate in the body of each distribution, but its asymptotic regime stops
working for large a in the regularized incomplete Γ—exactly where χ² tests
with many degrees of freedom land. CDFLIB's Tricomi–Temme asymptotic regime (one
of five branches in gamma_inc) covers this range cleanly:
| CDFLIB | rmathlib |
|
|---|---|---|
| (P, Q)(500, 500) | (0.5059, 0.4941) | (NaN, NaN) |
| (P, Q)(5000, 5000) | (0.5019, 0.4981) | (NaN, NaN) |
| (P, Q)(10⁶, 10⁶) | (0.5001, 0.4999) | (NaN, NaN) |
| (P, Q)(10⁹, 10⁹) | (0.5000, 0.5000) | (NaN, NaN) |
These correspond to χ²(1000), χ²(10⁴), χ²(2·10⁶), and χ²(2·10⁹) at their respective medians, which arise in goodness-of-fit and likelihood-ratio tests on large samples.
2. Solves for any parameter, not just x and p
Given a CDF identity p = F(x; θ₁, θ₂, …), most libraries can give you p from x (the CDF) or x from p (the inverse CDF, also called the quantile function). CDFLIB can additionally compute for any θᵢ when you know p, x, and the other parameters. For example:
-
“What standard deviation places probability 0.975 below x = 1.96, given a mean of 0?”
-
“What number of trials puts Pr[X ≤ 10] at 0.95 in a Binomial with success rate 0.3?”
-
“What degrees of freedom for a χ² distribution put 95% of the mass below x = 3.84?”
Examples
CDFs, complementary CDFs, and inverses
use Normal;
use ;
let n = try_new?;
let p = n.cdf; // 0.9750021048517796
let sf = n.ccdf; // 2.866516e-7, computed directly (not 1 - cdf)
let x = n.inverse_cdf?; // 1.9599639845400538
let xs = n.inverse_ccdf?; // 7.034484 (accurate deep into the right tail)
let d = n.pdf; // 0.3989422804014327
let mu = n.mean; // 0.0
# Ok::
Parameter searches
Given p = F(x; θ₁, θ₂, …), you can compute any parameter when the others are known. Two practical uses:
use ;
// Upper 95% confidence bound on λ after observing 3 Poisson events
// (the Garwood / exact-Poisson interval).
let lambda_hi = search_lambda?;
// 7.7537
// Degrees of freedom that put 95% of a χ² distribution below x = 3.84
// (recovers df = 1, the classic likelihood-ratio test critical value).
let df = search_df?;
// 0.9994
# Ok::
Power of a noncentral χ² test
use ;
use ContinuousCdf;
// Critical value of a χ²(5) test at α = 0.05.
let crit = try_new?.inverse_cdf?;
// 11.0705
// Power against a noncentral alternative with ncp = 10.
let power = try_new?.ccdf;
// 0.6774
# Ok::
Special functions directly
These routines are public for users who want the numerics without a distribution wrapper:
use ;
let = gamma_inc; // (0.3614, 0.6386) = (P(2.5,1.7), Q(2.5,1.7))
let e = error_f; // 0.7421
let = cumnor; // (0.9750, 0.0250) = (Φ(1.96), 1 - Φ(1.96))
# let _ = ;
Every special function with possible failure modes also has a try_* form
that returns a typed error instead of panicking:
use ;
let = try_gamma_inc?;
assert!;
# let _ = ;
# Ok::
Beautiful names
Rust allows you to rename the functions:
use ;
let x = Γ; // Γ(5/2) = (3/2)·√π / 2 ≈ 1.3293
let y = Β; // Β(2, 3) = 1/12
let = Φ; // Φ(1.96) ≈ 0.975
let γ = -ψ; // ψ(1) = −γ (Euler–Mascheroni)
# let _ = ;
Fidelity to CDFLIB
The port is semantically faithful: every algorithmic decision, polynomial
coefficient, branch threshold, and truncation depth matches cdflib.f90 to the
digit. The intentional structural divergences are:
- There is no silent error returned as a special value, or errors returned as an
integer index. All functions returning errors have a
try_prefix and return aResultwith a documented error type. The error types are designed to be as specific as possible about the nature of the error. - All functions with a
try_prefix have an infallible variant that panics on errors, and is documented as such. - The Fortran routine
gamma_useris exposed under the Rust namegamma. The Fortran name encodes a Fortran-2008 workaround (the language added agammaintrinsic, so the routine had to be renamed to avoid the collision). Rust has no such conflict, so the routine takes the bare family name, mirroringbeta. error_fc(ind, x)(which multiplexes plain and exponentially-scaled output via an integer flag) is split into two Rust functions,error_fcanderror_fc_scaled. Same numerics, no flag argument.- The Fortran
cum*andcdf*method families are folded into the corresponding distribution module'scdf/ccdf/inverse_cdf/inverse_ccdf/search_*methods rather than exposed as bare functions. dinvranddzror(the reverse-communication root finders) live as internal state machines incrate::search. They are not part of the public surface.- The search setup constants (
abs_step,rel_step,stp_mul,abs_tol,rel_tol) that the Fortrancdf*routines declare locally are centralized insrc/search/mod.rs; the one routine that needs a different absolute tolerance (cdfchn) uses an explicitsearch_monotone_with_atolcall.
The lower-level CDFLIB-style helpers (algdiv, bcorr, gam1, rlog,
etc.) live in cdflib::special::internal so the user-facing
cdflib::special surface stays focused on the routines a statistical user is
likely to call. Each CDFLIB algorithmic routine can be found under its original
name, modulo the renames and splits enumerated above. The machine-constant
utilities (ipmpar, dpmpar, exparg) and the ftnstop fatal-error sink are
not ported: the constants live as Rust module-level values, and error reporting
goes through the try_*/Result pairs described above.
Testing
Reference values for the test suite are pre-generated from the bundled Fortran 90
sources (tests/regenerate/) and committed as CSV fixtures under tests/data/.
cargo test reads the CSVs directly; CSV fixtures can be regenerated using the
shell scripts in tests/regenerate/ if desired; you will need a Fortran 90
compiler.
The code has been extensively tested against the original Fortran 90 and C
sources. In the process, we found serious bugs in rmathlib and a bug in the
Fortran 90 version of the library that has remained undetected for 25 years: a
coefficient for the computation of the error function had been transcribed from
the original Fortran 77 code with a wrong exponent (the C/C++ version are
unaffected).