[−][src]Function stirling_numbers::p_at_most_m_distinct_in_sample_of_x_from_n
pub fn p_at_most_m_distinct_in_sample_of_x_from_n(
m: usize,
x: usize,
n: usize,
sr: &[Vec<f64>]
) -> f64
Compute the probability of selecting at most m
distinct elements in
x
random draws with
replacement from a set of size n
.
Method. The probability of selecting exactly m
distinct elements is
Z(m,x,n) = S(x,m) * ( n * ... * (n-m+1) ) / n^x
where S(x,m)
is the Stirling number of the second kind.
Reference:
stack exchange question 32800
on the "probability distribution of coverage of a set after x independently randomly selected
members of the set".
In terms of Stirling ratios SR
,
Z(m,x,n) = SR(x,m) * (m/n)^x * choose(n,m)
.
Thus the probability of selecting at most m
distinct elements in x
random draws with replacement from a set of size n
is:
sum( SR(x,u) * (u/n)^x * choose(n,u), u = 0..=m )
= 1 - sum( SR(x,u) * (u/n)^x * choose(n,u), u = m+1..=x ).
which is computed below, using a precomputed Stirling ratio table.
Complexity. O( (x-m) * x )
. If one wants to speed this up, probably one
can do it by truncating the sum, without significantly affecting accuracy.
Testing and accuracy. For T = f64
, we test one value for this by
simulation. For
m = 27
, x = 30
, n = 2500
, the function computes
0.0005953
(rounded), versus 0.0005936
(rounded) for simulation
using a sample of size 100,000,000
.