cdflib 0.4.0

Pure-Rust port of CDFLIB: cumulative distribution functions (CDF) associated with common probability distributions
Documentation
  • Coverage
  • 88.26%
    233 out of 264 items documented37 out of 57 items with examples
  • Size
  • Source code size: 4.3 MB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 4.39 MB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 7s Average build duration of successful builds.
  • all releases: 8s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • vigna/CDFLIB-rs
    1 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • vigna

CDFLIB

A pure-Rust port of CDFLIB, the cumulative distribution function library by Barry Brown, James Lovato, and Kathy Russell.

The minimum supported Rust version is 1.71.

What is CDFLIB?

CDFLIB is a small, venerable numerical library dating to the early 1990s that computes cumulative distribution functions (CDFs) and their inverses for the standard distributions of frequentist statistics. It is distributed in the original Fortran 90, and in machine-translated C and C++. It covers eleven distributions:

Continuous Discrete
Β Binomial
χ², noncentral χ² Negative binomial
F (Fisher–Snedecor), noncentral F Poisson
Γ
Normal
Student's t

Goals

The goal of this crate is to provide CDFLIB in pure Rust. The underlying special functions (gamma_inc, beta_inc, error_f, cumnor, etc.) are exposed publicly in a cdflib::special module for users who want the routines without the distribution wrappers.

The API is designed to be ergonomic and idiomatic for Rust users, with traits representing the common functionality of continuous and discrete distributions, and comprehensive error handling via the thiserror crate.

Non-goals

Expanding or altering the API beyond what CDFLIB offers is explicitly out of scope. This is a machine-translated port of the Fortran 90 code. Other libraries, such as statrs, can use the high-precision functions provided by CDFLIB to build more ergonomic APIs. The only exception are convenience textbook one-liners for mean, variance, and so on.

Notation conventions

The crate, like CDFLIB itself, uses several interchangeable names for the lower- and upper-tail probabilities of a distribution. The synonyms are:

Concept Method F90 name CDFLIB code Other names
Pr[Xx] cdf cum P lower-tail probability
Pr[X > x] ccdf ccum Q upper tail (complementary CDF)

The two are mathematically complementary (P + Q = 1), but the crate computes them independently rather than deriving one from the other by subtraction. This is what lets the small tail keep its precision deep into the tails, where 1.0 - cdf(x) would lose digits to cancellation.

The incomplete-Γ and incomplete-Β routines follow the same convention: gamma_inc returns the pair (P, Q), beta_inc returns (Iₓ(a, b), 1 − Iₓ(a, b)).

Why CDFLIB?

Many libraries compute CDFs. CDFLIB is distinguished by two design choices:

1. Stays accurate in the tails and at large parameter values

The numerical heart of CDFLIB is the pair of regularized incomplete-function routines gamma_inc (≈ ACM Algorithm 654) and beta_inc (≈ ACM Algorithm 708). Both dispatch across five computational regimes depending on the location in parameter space (power series, continued fraction, Tricomi–Temme-style asymptotic expansion, near-integer specialization, and ratio-extreme handling) and they return both the lower and upper tail probabilities directly, without computing one from the other.

This is the same algorithm family that underlies SciPy's incomplete-Γ/Β routines. It delivers near-machine precision (13–15 digits) deep into the tails and at large parameter values, where continued-fraction implementations lose digits to subtractive cancellation or stall on convergence.

The Rust statistical ecosystem already has statrs, which covers most of CDFLIB's distributions. However, at the time of this writing statrs does not offer parameter searches, noncentral χ², or noncentral F, and its special functions are not as precise as CDFLIB's:

True value CDFLIB statrs
P(10¹³, 10¹³ + 1) ≈ 0.5 0.5000 0.4926
P(10¹⁵, 10¹⁵ − 1) ≈ 0.5 0.5000 0.00645
Iₓ(10⁸, 4·10⁸) at x = 0.2 ≈ 0.5 0.500009 2.262
Iₓ(10¹², 3·10¹²) at x = 0.25 ≈ 0.5 0.500000 217.7
Pr[Poisson(10¹⁵) > 10¹⁵ + 2·√10¹⁵] ≈ 0.0228 0.02275 3.97·10⁻⁵

Note that these calls are not realistic for a statistician: for everyday usage, statrs and this crate will give the same results. The last example, however, was the author's motivation for this port—large-scale collision tests of pseudorandom number generators are starting to land in that area due to more core memory being available, and to improved techniques.

rmathlib, a Rust port of R's special-function library, is another option. It is accurate in the body of each distribution, but its asymptotic regime stops working for large a in the regularized incomplete Γ—exactly where χ² tests with many degrees of freedom land. CDFLIB's Tricomi–Temme asymptotic regime (one of five branches in gamma_inc) covers this range cleanly:

CDFLIB rmathlib
(P, Q)(500, 500) (0.5059, 0.4941) (NaN, NaN)
(P, Q)(5000, 5000) (0.5019, 0.4981) (NaN, NaN)
(P, Q)(10⁶, 10⁶) (0.5001, 0.4999) (NaN, NaN)
(P, Q)(10⁹, 10⁹) (0.5000, 0.5000) (NaN, NaN)

These correspond to χ²(1000), χ²(10⁴), χ²(2·10⁶), and χ²(2·10⁹) at their respective medians, which arise in goodness-of-fit and likelihood-ratio tests on large samples.

2. Solves for any parameter, not just x and p

Given a CDF identity p = F(x; θ₁, θ₂, …), most libraries can give you p from x (the CDF) or x from p (the inverse CDF, also called the quantile function). CDFLIB can additionally compute for any θᵢ when you know p, x, and the other parameters. For example:

  • “What standard deviation places probability 0.975 below x = 1.96, given a mean of 0?”
  • “What number of trials puts Pr[X ≤ 10] at 0.95 in a Binomial with success rate 0.3?”
  • “What degrees of freedom for a χ² distribution put 95% of the mass below x = 3.84?”

Examples

CDFs, complementary CDFs, and inverses

use cdflib::Normal;
use cdflib::traits::{Continuous, ContinuousCdf, Mean};

let n = Normal::try_new(0.0, 1.0)?;
let p   = n.cdf(1.96);              // 0.9750021048517796
let sf  = n.ccdf(5.0);                // 2.866516e-7, computed directly (not 1 - cdf)
let x   = n.inverse_cdf(0.975)?;    // 1.9599639845400538
let xs  = n.inverse_ccdf(1e-12)?;     // 7.034484 (accurate deep into the right tail)
let d   = n.pdf(0.0);               // 0.3989422804014327
let mu  = n.mean();                 // 0.0
# Ok::<(), cdflib::NormalError>(())

Parameter searches

Given p = F(x; θ₁, θ₂, …), you can compute any parameter when the others are known. Two practical uses:

use cdflib::{ChiSquared, Poisson};

// Upper 95% confidence bound on λ after observing 3 Poisson events
// (the Garwood / exact-Poisson interval).
let lambda_hi = Poisson::search_lambda(0.05, 0.95, 3)?;
// 7.7537

// Degrees of freedom that put 95% of a χ² distribution below x = 3.84
// (recovers df = 1, the classic likelihood-ratio test critical value).
let df = ChiSquared::search_df(0.95, 0.05, 3.84)?;
// 0.9994
# Ok::<(), Box<dyn std::error::Error>>(())

Power of a noncentral χ² test

use cdflib::{ChiSquared, ChiSquaredNoncentral};
use cdflib::traits::ContinuousCdf;

// Critical value of a χ²(5) test at α = 0.05.
let crit = ChiSquared::try_new(5.0)?.inverse_cdf(0.95)?;
// 11.0705

// Power against a noncentral alternative with ncp = 10.
let power = ChiSquaredNoncentral::try_new(5.0, 10.0)?.ccdf(crit);
// 0.6774
# Ok::<(), Box<dyn std::error::Error>>(())

Special functions directly

These routines are public for users who want the numerics without a distribution wrapper:

use cdflib::special::{cumnor, error_f, gamma_inc};

let (p, q)      = gamma_inc(2.5, 1.7);   // (0.3614, 0.6386) = (P(2.5,1.7), Q(2.5,1.7))
let e           = error_f(0.8);          // 0.7421
let (phi, sphi) = cumnor(1.96);          // (0.9750, 0.0250) = (Φ(1.96), 1 - Φ(1.96))
# let _ = (p, q, e, phi, sphi);

Every special function with possible failure modes also has a try_* form that returns a typed error instead of panicking:

use cdflib::special::{try_gamma_inc, GammaIncError};

let (p, q) = try_gamma_inc(2.5, 1.7)?;
assert!(matches!(try_gamma_inc(-1.0, 1.0), Err(GammaIncError::ANegative(_))));
# let _ = (p, q);
# Ok::<(), GammaIncError>(())

Beautiful names

Rust allows you to rename the functions:

use cdflib::special::{beta as Β, cumnor as Φ, gamma as Γ, psi as ψ};

let x = Γ(2.5);            // Γ(5/2) = (3/2)·√π / 2 ≈ 1.3293
let y = Β(2.0, 3.0);       // Β(2, 3) = 1/12
let (p, _) = Φ(1.96);      // Φ(1.96) ≈ 0.975
let γ = -ψ(1.0);           // ψ(1) = −γ (Euler–Mascheroni)
# let _ = (x, y, p, γ);

Fidelity to CDFLIB

The port is semantically faithful: every algorithmic decision, polynomial coefficient, branch threshold, and truncation depth matches cdflib.f90 to the digit. The intentional structural divergences are:

  • There is no silent error returned as a special value, or errors returned as an integer index. All functions returning errors have a try_ prefix and return a Result with a documented error type. The error types are designed to be as specific as possible about the nature of the error.
  • All functions with a try_ prefix have an infallible variant that panics on errors, and is documented as such.
  • The Fortran routine gamma_user is exposed under the Rust name gamma. The Fortran name encodes a Fortran-2008 workaround (the language added a gamma intrinsic, so the bundled CDFLIB routine had to be renamed to avoid the collision). Rust has no such conflict, so the routine takes the bare family name, mirroring how beta is the bare-name principal function of the Β family.
  • error_fc(ind, x) (which multiplexes plain and exponentially-scaled output via an integer flag) is split into two Rust functions, error_fc and error_fc_scaled. Same numerics, no flag argument.
  • The Fortran cum* and cdf* dispatcher families are folded into the corresponding distribution module's cdf / ccdf / inverse_cdf / inverse_ccdf / search_* methods rather than exposed as bare functions.
  • dinvr and dzror (the reverse-communication root finders) live as internal state machines in crate::search. They are not part of the public surface.
  • The search setup constants (abs_step, rel_step, stp_mul, abs_tol, rel_tol) that the Fortran cdf* routines declare locally are centralized in src/search/mod.rs; the one routine that needs a different absolute tolerance (cdfchn) uses an explicit search_monotone_with_atol call.

The lower-level CDFLIB-style helpers (algdiv, bcorr, gam1, rlog, etc.) live in cdflib::special::internal so the user-facing cdflib::special surface stays focused on the routines a statistical user is likely to call. Both surfaces are public and documented; a port from C/Fortran can find each CDFLIB algorithmic routine under its original name in one or the other, modulo the renames and splits enumerated above. The machine-constant utilities (ipmpar, dpmpar, exparg) and the ftnstop fatal-error sink are not ported: the constants live as Rust module-level values, and error reporting goes through the try_*/Result pairs described above.

Testing

Reference values for the test suite are pre-generated from the bundled Fortran 90 sources (tests/regenerate/) and committed as CSV fixtures under tests/data/. cargo test reads the CSVs directly; CSV fixtures can be regenerated using the shell scripts in tests/regenerate/ if desired; you will need a Fortran 90 compiler.

The code has been extensively tested against the original Fortran 90 and C sources. In the process, we found serious bugs in rmathlib and a major mistake in the Fortran 90 version of the library that has remained undetected for 25 years: a coefficient for the computation of the error function had been transcribed from the original Fortran 77 code with a wrong exponent.