[][src]Function stirling_numbers::p_at_most_m_distinct_in_sample_of_x_from_n

pub fn p_at_most_m_distinct_in_sample_of_x_from_n(
    m: usize,
    x: usize,
    n: usize,
    sr: &[Vec<f64>]
) -> f64

Compute the probability of selecting at most m distinct elements in x random draws with replacement from a set of size n.
 

Method. The probability of selecting exactly m distinct elements is
Z(m,x,n) = S(x,m) * ( n * ... * (n-m+1) ) / n^x
where S(x,m) is the Stirling number of the second kind. Reference: stack exchange question 32800 on the "probability distribution of coverage of a set after x independently randomly selected members of the set".

In terms of Stirling ratios SR, Z(m,x,n) = SR(x,m) * (m/n)^x * choose(n,m).

Thus the probability of selecting at most m distinct elements in x random draws with replacement from a set of size n is:
sum( SR(x,u) * (u/n)^x * choose(n,u), u = 0..=m )
= 1 - sum( SR(x,u) * (u/n)^x * choose(n,u), u = m+1..=x ).

which is computed below, using a precomputed Stirling ratio table.

Complexity. O( (x-m) * x ). If one wants to speed this up, probably one can do it by truncating the sum, without significantly affecting accuracy.

Testing and accuracy. For T = f64, we test one value for this by simulation. For m = 27, x = 30, n = 2500, the function computes 0.0005953 (rounded), versus 0.0005936 (rounded) for simulation using a sample of size 100,000,000.