plato-lab-guard
Unfakeable Constraint Lab
Achievement Loss scoring prevents cherry-picked experimental results.
What?
plato-lab-guard prevents p-hacking and cherry-picking in experimental results. It introduces Achievement Loss scoring — a running penalty that tracks how many hypotheses an experimenter tested before finding a "significant" result. The more you test, the higher your bar.
This is the scientific integrity layer for the PLATO knowledge graph: before a tile can claim a factual result, it must pass through the lab guard's verification gate.
Quick Start
[]
= "0.1"
use *;
let mut guard = new;
// Register a hypothesis BEFORE running the experiment
let hyp = guard.register;
// Run experiment and report results
let result = ExperimentResult ;
// The gate checks: effect size, p-value, AND achievement loss
let verdict = guard.evaluate;
// Achievement Loss penalizes if you've tested 50 hypotheses this session
// and this is the first "significant" one
Core Concepts
| Type | Description |
|---|---|
Hypothesis |
A registered, time-stamped experimental claim |
HypothesisStatus |
Pending, Passed, Failed, CherryPicked |
ExperimentResult |
Measured outcome with effect size, p-value, sample size |
Verdict |
Final judgment incorporating achievement loss |
GateResult |
Binary pass/fail with detailed scoring breakdown |
LabGuard |
The full engine tracking all hypotheses and scoring |
Achievement Loss
Tested 1 hypothesis → significant → Achievement Loss: low (credible)
Tested 50 hypotheses → 1st significant → Achievement Loss: high (suspicious)
Tested 50 hypotheses → all significant → Achievement Loss: extreme (impossible)
The Achievement Loss formula penalizes based on:
- Number of prior tests — more tests = higher bar
- Prior failure rate — many failures before success = suspicious
- Effect size consistency — wildly varying effects across tests = unreliable
Part of PLATO
Part of the PLATO ecosystem — scientific integrity for AI agent knowledge production.
License
MIT — Cocapn Fleet