plato-lab-guard

Unfakeable Constraint Lab

Achievement Loss scoring prevents cherry-picked experimental results.

What?

plato-lab-guard prevents p-hacking and cherry-picking in experimental results. It introduces Achievement Loss scoring — a running penalty that tracks how many hypotheses an experimenter tested before finding a "significant" result. The more you test, the higher your bar.

This is the scientific integrity layer for the PLATO knowledge graph: before a tile can claim a factual result, it must pass through the lab guard's verification gate.

Quick Start

[dependencies]
plato-lab-guard = "0.1"

use plato_lab_guard::*;

let mut guard = LabGuard::new();

// Register a hypothesis BEFORE running the experiment
let hyp = guard.register(Hypothesis::new(
    "constraint_snap_reduces_drift",
    "Snapping to manifold reduces float drift by >90%"
));

// Run experiment and report results
let result = ExperimentResult {
    hypothesis_id: hyp.id,
    observed_effect: 0.95,  // 95% reduction
    p_value: 0.003,
    sample_size: 10_000,
};

// The gate checks: effect size, p-value, AND achievement loss
let verdict = guard.evaluate(result);
// Achievement Loss penalizes if you've tested 50 hypotheses this session
// and this is the first "significant" one

Core Concepts

Type	Description
`Hypothesis`	A registered, time-stamped experimental claim
`HypothesisStatus`	`Pending`, `Passed`, `Failed`, `CherryPicked`
`ExperimentResult`	Measured outcome with effect size, p-value, sample size
`Verdict`	Final judgment incorporating achievement loss
`GateResult`	Binary pass/fail with detailed scoring breakdown
`LabGuard`	The full engine tracking all hypotheses and scoring

Achievement Loss

Tested 1 hypothesis → significant → Achievement Loss: low (credible)
Tested 50 hypotheses → 1st significant → Achievement Loss: high (suspicious)
Tested 50 hypotheses → all significant → Achievement Loss: extreme (impossible)

The Achievement Loss formula penalizes based on:

Number of prior tests — more tests = higher bar
Prior failure rate — many failures before success = suspicious
Effect size consistency — wildly varying effects across tests = unreliable

Part of PLATO

Part of the PLATO ecosystem — scientific integrity for AI agent knowledge production.

License

MIT — Cocapn Fleet