1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
//! Code generation evaluation metrics
//!
//! Provides pass@k — the unbiased estimator for functional correctness
//! of code generation models (Chen et al., 2021 "Evaluating Large Language
//! Models Trained on Code").
/// Compute pass@k: unbiased estimator of functional correctness.
///
/// Formula: `1 - C(n-c, k) / C(n, k)`
///
/// where n = total samples, c = correct samples, k = top-k threshold.
///
/// Returns a value in [0, 1] where 1.0 means all k samples pass.
///
/// # Arguments
/// * `n` - Total number of generated code samples
/// * `c` - Number of correct (passing) samples
/// * `k` - Number of samples to consider (typically 1, 10, or 100)
///
/// # Edge Cases
/// * If `k > n`, returns `if c > 0 { 1.0 } else { 0.0 }`
/// * If `c >= n`, returns 1.0
/// * If `c == 0`, returns 0.0