aprender 0.29.3

Next-generation ML framework in pure Rust — `cargo install aprender` for the `apr` CLI
Documentation
# Case Study: Beta-Binomial Bayesian Inference

This case study demonstrates Bayesian inference for binary outcomes using conjugate priors. We cover four practical scenarios: coin flip inference, A/B testing, sequential learning, and prior comparison.

## Overview

The Beta-Binomial conjugate family is the foundation of Bayesian inference for binary data:
- **Prior**: Beta(α, β) distribution over probability parameter θ ∈ [0, 1]
- **Likelihood**: Binomial(n, θ) for k successes in n trials
- **Posterior**: Beta(α + k, β + n - k) with closed-form update

This enables exact Bayesian inference without numerical integration.

## Running the Example

```bash
cargo run --example beta_binomial_inference
```

Expected output: Four demonstrations showing prior specification, posterior updating, credible intervals, and sequential learning.

## Example 1: Coin Flip Inference

### Problem

You flip a coin 10 times and observe 7 heads. What is the probability that this coin is fair (θ = 0.5)?

### Solution

```rust
use aprender::bayesian::BetaBinomial;

// Start with uniform prior Beta(1, 1) = complete ignorance
let mut model = BetaBinomial::uniform();
println!("Prior: Beta({}, {})", model.alpha(), model.beta());
println!("  Prior mean: {:.4}", model.posterior_mean());  // 0.5

// Observe 7 heads in 10 flips
model.update(7, 10);

// Posterior is Beta(1+7, 1+3) = Beta(8, 4)
println!("Posterior: Beta({}, {})", model.alpha(), model.beta());
println!("  Posterior mean: {:.4}", model.posterior_mean());  // 0.6667
```

### Posterior Statistics

```rust
// Point estimates
let mean = model.posterior_mean();  // E[θ|D] = 8/12 = 0.6667
let mode = model.posterior_mode().unwrap();  // (8-1)/(12-2) = 0.7
let variance = model.posterior_variance();  // ≈ 0.017

// 95% credible interval
let (lower, upper) = model.credible_interval(0.95).unwrap();
// ≈ [0.41, 0.92] - wide interval due to small sample size

// Posterior predictive
let prob_heads = model.posterior_predictive();  // 0.6667
```

### Interpretation

**Posterior mean (0.667)**: Our best estimate is that the coin has a 66.7% chance of heads.

**Credible interval [0.41, 0.92]**: We are 95% confident that the true probability is between 41% and 92%. This wide interval reflects uncertainty from small sample size.

**Posterior predictive (0.667)**: The probability of heads on the next flip is 66.7%, integrating over all possible values of θ weighted by the posterior.

### Is the coin fair?

The credible interval includes 0.5, so we **cannot rule out** that the coin is fair. With only 10 flips, the data is consistent with a fair coin that happened to land heads 7 times by chance.

## Example 2: A/B Testing

### Problem

You run an A/B test comparing two website variants:
- **Variant A**: 120 conversions out of 1,000 visitors (12% conversion rate)
- **Variant B**: 145 conversions out of 1,000 visitors (14.5% conversion rate)

Is Variant B significantly better, or could the difference be due to chance?

### Solution

```rust
// Variant A: 120 conversions / 1000 visitors
let mut variant_a = BetaBinomial::uniform();
variant_a.update(120, 1000);
let mean_a = variant_a.posterior_mean();  // 0.1208
let (lower_a, upper_a) = variant_a.credible_interval(0.95).unwrap();
// 95% CI: [0.1006, 0.1409]

// Variant B: 145 conversions / 1000 visitors
let mut variant_b = BetaBinomial::uniform();
variant_b.update(145, 1000);
let mean_b = variant_b.posterior_mean();  // 0.1457
let (lower_b, upper_b) = variant_b.credible_interval(0.95).unwrap();
// 95% CI: [0.1239, 0.1675]
```

### Decision Rule

Check if credible intervals overlap:

```rust
if lower_b > upper_a {
    println!("✓ Variant B is significantly better (95% confidence)");
} else if lower_a > upper_b {
    println!("✓ Variant A is significantly better (95% confidence)");
} else {
    println!("⚠ No clear winner yet - credible intervals overlap");
    println!("  Consider collecting more data");
}
```

### Interpretation

**Output**: "No clear winner yet - credible intervals overlap"

The credible intervals overlap: [10.06%, 14.09%] for A and [12.39%, 16.75%] for B. While B appears better (14.57% vs 12.08%), the uncertainty intervals overlap, meaning we cannot conclusively say B is superior.

**Recommendation**: Collect more data to reduce uncertainty and determine if the 2.5 percentage point difference is real or due to sampling variability.

### Bayesian vs Frequentist

**Frequentist approach**: Run a z-test for proportions, get p-value ≈ 0.02. Conclude "significant at α = 0.05 level."

**Bayesian advantage**:
- Direct probability statements: "95% confident B's conversion rate is between 12.4% and 16.8%"
- Can incorporate prior knowledge (e.g., historical conversion rates)
- Natural stopping rules: collect data until credible intervals separate
- No p-value misinterpretation ("p = 0.02" does NOT mean "2% chance hypothesis is true")

## Example 3: Sequential Learning

### Problem

Demonstrate how uncertainty decreases as we collect more data, even with a consistent underlying success rate.

### Solution

Run 5 sequential experiments with true success rate ≈ 77%:

```rust
let mut model = BetaBinomial::uniform();

let experiments = vec![
    (7, 10),   // 70% success
    (15, 20),  // 75% success
    (23, 30),  // 76.7% success
    (31, 40),  // 77.5% success
    (77, 100), // 77% success
];

for (successes, trials) in experiments {
    model.update(successes, trials);

    let mean = model.posterior_mean();
    let variance = model.posterior_variance();
    let (lower, upper) = model.credible_interval(0.95).unwrap();
    let width = upper - lower;

    println!("Trials: {}, Mean: {:.3}, Variance: {:.7}, CI Width: {:.4}",
             total_trials, mean, variance, width);
}
```

### Results

| Trials | Successes | Mean  | Variance  | 95% CI Width |
|--------|-----------|-------|-----------|--------------|
| 10     | 7         | 0.667 | 0.0170940 | 0.5125       |
| 30     | 22        | 0.719 | 0.0061257 | 0.3068       |
| 60     | 45        | 0.742 | 0.0030392 | 0.2161       |
| 100    | 76        | 0.755 | 0.0017964 | 0.1661       |
| 200    | 153       | 0.762 | 0.0008924 | 0.1171       |

### Interpretation

**Observation 1**: Posterior mean converges to true value (0.762 → 0.77)

**Observation 2**: Variance decreases inversely with sample size

For Beta(α, β): Var[θ] = αβ / [(α+β)²(α+β+1)]

As α + β (total count) increases, variance decreases approximately as 1/(α+β).

**Observation 3**: Credible interval width shrinks with √n

The 95% CI width drops from 51% (n=10) to 12% (n=200), reflecting increased certainty.

### Practical Application

**Early Stopping**: If credible intervals separate in A/B test, you can stop early and deploy the winner. No need for fixed sample size planning as in frequentist statistics.

**Sample Size Planning**: Want 95% CI width < 5%? Solve for α + β ≈ 400 (200 trials).

## Example 4: Prior Comparison

### Problem

Demonstrate how different priors affect the posterior with limited data.

### Solution

Same data (7 successes in 10 trials), three different priors:

```rust
// 1. Uniform Prior Beta(1, 1)
let mut uniform = BetaBinomial::uniform();
uniform.update(7, 10);
// Posterior: Beta(8, 4), mean = 0.6667

// 2. Jeffrey's Prior Beta(0.5, 0.5)
let mut jeffreys = BetaBinomial::jeffreys();
jeffreys.update(7, 10);
// Posterior: Beta(7.5, 3.5), mean = 0.6818

// 3. Informative Prior Beta(50, 50) - strong 50% belief
let mut informative = BetaBinomial::new(50.0, 50.0).unwrap();
informative.update(7, 10);
// Posterior: Beta(57, 53), mean = 0.5182
```

### Results

| Prior Type | Prior | Posterior | Posterior Mean |
|------------|-------|-----------|----------------|
| Uniform    | Beta(1, 1) | Beta(8, 4) | 0.6667 |
| Jeffrey's  | Beta(0.5, 0.5) | Beta(7.5, 3.5) | 0.6818 |
| Informative | Beta(50, 50) | Beta(57, 53) | 0.5182 |

### Interpretation

**Weak priors** (Uniform, Jeffrey's): Posterior dominated by data (≈67% mean)

**Strong prior** (Beta(50, 50)): Posterior pulled toward prior belief (51.8% vs 66.7%)

The informative prior Beta(50, 50) encodes a strong belief that θ ≈ 0.5 with effective sample size of 100. With only 10 new observations, the prior dominates, pulling the posterior mean from 0.667 down to 0.518.

### When to Use Strong Priors

**Use informative priors when**:
- You have reliable historical data
- Expert domain knowledge is available
- Rare events require regularization
- Hierarchical learning across related tasks

**Avoid informative priors when**:
- No reliable prior knowledge exists
- Prior assumptions may be wrong
- Stakeholders require "data-driven" decisions
- Exploring novel domains

### Prior Sensitivity Analysis

Always check robustness:

1. Run inference with weak prior (Beta(1, 1))
2. Run inference with strong prior (Beta(50, 50))
3. If posteriors differ substantially, **collect more data** until they converge

With enough data, all reasonable priors converge to the same posterior (Bayesian consistency).

## Key Takeaways

**1. Conjugate priors enable closed-form updates**
- No MCMC or numerical integration required
- Efficient for real-time sequential updating (online learning)

**2. Credible intervals quantify uncertainty**
- Direct probability statements about parameters
- Width decreases with √n as data accumulates

**3. Sequential updating is natural in Bayesian framework**
- Each posterior becomes the next prior
- Final result is order-independent

**4. Prior choice matters with small data**
- Weak priors: let data speak
- Strong priors: incorporate domain knowledge
- Always perform sensitivity analysis

**5. Bayesian A/B testing avoids p-value pitfalls**
- No arbitrary α = 0.05 threshold
- Natural early stopping rules
- Direct decision-theoretic framework

## Related Chapters

- [Bayesian Inference Theory]../ml-fundamentals/bayesian-inference.md
- [Naive Bayes Theory]../ml-fundamentals/naive-bayes.md

## References

1. **Jaynes, E. T. (2003)**. *Probability Theory: The Logic of Science*. Cambridge University Press. Chapter 6: "Elementary Parameter Estimation."

2. **Gelman, A., et al. (2013)**. *Bayesian Data Analysis* (3rd ed.). CRC Press. Chapter 2: "Single-parameter Models."

3. **Kruschke, J. K. (2014)**. *Doing Bayesian Data Analysis* (2nd ed.). Academic Press. Chapter 6: "Inferring a Binomial Probability via Exact Mathematical Analysis."

4. **VanderPlas, J. (2014)**. "Frequentism and Bayesianism: A Python-driven Primer." arXiv:1411.5018. Excellent comparison of paradigms with code examples.