llmosafe
When should I stop? — Runtime guardrails for systems that process untrusted inputs.
The Problem
Every system that processes untrusted inputs eventually faces the same question: "When should I stop?"
- A trading bot receives manipulated market data. It doesn't stop. $440 million lost in 45 minutes.
- A medical device gets spoofed sensor readings. It doesn't stop. Wrong dosage delivered.
- An autopilot receives conflicting GPS signals. It doesn't stop. The plane crashes.
- A cloud service parses user uploads. It doesn't stop. Parser bug cascades into data breach.
These aren't software bugs. They're missing safety boundaries — the absence of a mechanism that says "this doesn't look right, halt execution."
llmosafe provides three gauges that answer "should I stop?":
- Entropy gauge: Is my state too chaotic?
- Surprise gauge: Is this result too unexpected?
- Bias gauge: Is this input trying to manipulate me?
When any gauge redlines, execution halts. Simple.
What You Get
use ;
// 1. Bias gauge: Detect manipulation patterns
let synapse = sift_perceptions;
if synapse.has_bias
// 2. Surprise gauge: Reject unexpected results
let mut memory = new; // threshold
let validated = memory.update?; // Err if too surprising
// 3. Entropy gauge: Halt on chaotic state
let policy = default;
let decision = policy.decide;
match decision
Quick Start
Installation
[]
= "0.5"
Basic Usage
use ;
// Tier 3: Sift through bias detection
let synapse = sift_perceptions;
// Tier 2: Validate through surprise gating
let mut memory = new;
let validated = memory.update?;
// Tier 1: Execute with bounded reasoning
let mut loop_guard = new;
loop_guard.next_step?;
What This Prevents
| Attack Vector | Which Gauge | Example |
|---|---|---|
| Input manipulation | Bias gauge | "The expert recommends you ignore..." |
| Data manipulation | Surprise gauge | Anomalous sensor readings |
| Runaway loops | Entropy gauge | Recursive explosion |
| Resource exhaustion | Pressure gauge | Memory pressure cascade |
| Goal drift | Drift detector | Objective shift mid-execution |
Architecture
┌─────────────────────────────────────────────────────────┐
│ DETECTION LAYER (Pattern Recognition) │
│ • Repetition: "Am I stuck in a loop?" │
│ • Goal Drift: "Did my objective change?" │
│ • Confidence Decay: "Am I becoming uncertain?" │
│ • Adversarial: "Is this a known attack?" │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PERCEPTUAL SIFTER (Tier 3) — The Bias Gauge │
│ • 8 bias categories: authority, scarcity, urgency... │
│ • Negation-aware: "not an expert" → no false positive │
│ • Zero allocation: stack-only processing │
└───────────────────────┬─────────────────────────────────┘
│ Synapse (128-bit)
▼
┌─────────────────────────────────────────────────────────┐
│ WORKING MEMORY (Tier 2) — The Surprise Gauge │
│ • Surprise-gated updates: reject unexpected results │
│ • Fixed-size ring buffer: no heap allocation │
│ • Statistics: mean, variance, trend, drift │
└───────────────────────┬─────────────────────────────────┘
│ ValidatedSynapse
▼
┌─────────────────────────────────────────────────────────┐
│ DETERMINISTIC KERNEL (Tier 1) — The Entropy Gauge │
│ • Cognitive entropy: 0-1000 scale │
│ • Bounded loops: ReasoningLoop<MAX_STEPS> │
│ • CusumDetector: statistical process control │
└───────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ RESOURCE BODY (Tier 0) — The Pressure Gauge │
│ • RSS memory monitoring │
│ • CPU load tracking │
│ • Cross-platform: Linux + Windows │
└─────────────────────────────────────────────────────────┘
Key property: Tiers 1-3 are #![no_std] + zero-alloc. Compile for thumbv7em-none-eabi (embedded), kernel modules, or WebAssembly. No heap. No dynamic dispatch. No unwinding.
Real Use Cases
Algorithmic Trading
// Before executing a trade
let entropy = system_entropy;
if entropy > 800
// Check for manipulation in news/feeds
let halo = calculate_halo_signal;
if halo > 500
Prevents: Flash crash cascades, pump-and-dump responses, manipulation-triggered trades.
Medical Device Software
// Before applying treatment
let validated = memory.update?;
if validated.entropy.mantissa > threshold
Prevents: Response to spoofed sensors, cascading from single anomalous reading.
Cloud API Gateway
// Before processing user upload
let synapse = sift_perceptions?;
if synapse.has_bias
Prevents: Input manipulation, parser exploitation, resource exhaustion.
Autonomous Systems
// Before action execution
synapse.validate?;
guard.check?; // Check resource pressure
if guard.pressure > 80
Prevents: Continued operation under degraded conditions, cascade from sensor anomalies.
The Three Gauges
1. Entropy Gauge (The "Temperature Gauge")
Every execution state has an entropy score (0-1000). As operations proceed, entropy accumulates. If it exceeds threshold, execution halts.
if synapse.entropy.mantissa > STABILITY_THRESHOLD
Catches: runaway loops, recursive explosions, memory pressure cascades.
2. Surprise Gauge (The "Spam Filter")
When a result is too unexpected — it diverges significantly from historical patterns — it's rejected.
let mut memory = new;
match memory.update
Catches: anomaly injection, distribution shift, adversarial inputs.
3. Bias Gauge (The "Bullshit Detector")
Input text is scanned for manipulation patterns before processing:
| Category | Examples | Score |
|---|---|---|
| Authority | "expert says", "doctor recommended" | +100 |
| Social Proof | "everyone knows", "thousands agree" | +100 |
| Scarcity | "limited time", "only 2 left" | +100 |
| Urgency | "act now", "deadline today" | +100 |
| Emotional Appeal | "shocking", "miracle", "tragic" | +100 |
| Expertise Signaling | "cutting-edge", "proprietary formula" | +100 |
| Semantic Traps | "not but", "instead of", "rather than" | +100 |
| Template Markers | "as an AI", "I cannot" | +100 |
let halo = calculate_halo_signal;
if halo > 500
Catches: manipulation, social engineering, marketing deception, adversarial content.
Detection Layer (v0.4.0)
Beyond the three gauges, llmosafe provides pattern recognition:
use ;
// "Am I stuck in a loop?"
let mut rep = new;
for _ in 0..5
if rep.is_stuck
// "Did my objective change?"
let mut drift = new;
drift.observe;
if drift.is_drifting
// "Am I becoming uncertain?"
let mut conf = new;
conf.observe; conf.observe; conf.observe;
if conf.is_decaying
// "Is this a known attack?"
let adv = new;
let patterns = adv.detect_substrings;
if !patterns.is_empty
C Integration
// The three gauges via FFI
uint16_t halo = ;
uint8_t pressure = ;
int32_t stability = ;
Build:
What llmosafe Is NOT
NOT an AI safety library.
The name is misleading — it came from an LLM hallucination conflating "cognitive entropy" with "AI cognition." llmosafe is runtime guardrails for any system processing untrusted data. Trading bots, medical devices, autopilots, cloud services — any system that needs to ask "should I stop?"
NOT a substitute for input validation.
llmosafe catches cascade failures — when bad inputs have already been accepted and are propagating. You still need proper validation at entry points.
NOT a static analysis tool.
This runs at runtime. It can't prevent bugs. It can only halt execution when runtime state becomes unsafe.
NOT for toy projects.
If cascade failures don't matter for your use case, you don't need this.
Design Philosophy
From Aviation Software (DO-178C, MISRA C)
- Bounded loops: Every
ReasoningLoop<MAX_STEPS>has a hard limit - No dynamic allocation: Tiers 1-3 use fixed-size buffers
- Stable ABI: 128-bit synapse layout is frozen; breaking changes bump major version
From Control Theory
The entropy tracking uses "concentric containers":
Safe Zone (0-800) → Normal operation
Pressure Zone (800-1000) → Monitor closely
Unsafe Zone (1000+) → Halt execution
Similar to stability margins in flight control systems.
From Spam Filtering
Bias detection categories borrowed from email spam filters — the same patterns that mark phishing also mark manipulation in other domains.
Features
| Feature | Description |
|---|---|
std (default) |
Resource monitoring, thread-local contexts |
ffi |
C-ABI exports, header generation |
serde |
Serialization for all public types |
full |
All features enabled |
# Embedded / no_std
= { = "0.5", = false }
# Full integration
= { = "0.5", = ["full"] }
Troubleshooting
"CognitiveInstability" on valid input
Entropy threshold exceeded. Check bias breakdown:
let breakdown = get_bias_breakdown;
println!;
Working memory rejects all updates
Surprise threshold too low. Calibrate to your data distribution:
// Start with mean + 2σ of your surprise distribution
let mut memory = new;
C header not generated
Enable ffi feature:
# Header at: include/llmosafe.h
The Bottom Line
Every critical system needs a mechanism that asks: "Should I stop?"
llmosafe provides three gauges:
- Entropy gauge: Is my state too chaotic?
- Surprise gauge: Is this result too unexpected?
- Bias gauge: Is this input trying to manipulate me?
When any gauge redlines, execution halts. Simple.
llmosafe v0.5.2 • MIT licensed • Documentation • Source