JailGuard
JailGuard is a pure-Rust prompt-injection detector with a 1.5 MB embedded MLP classifier. It scores text in p50 14 ms on CPU, achieves 98.40% accuracy on a 7,049-sample held-out test set drawn from 17 public datasets, and ships bindings for Rust, Python, JavaScript, Go, and Elixir. No external service, no API key. Dual-licensed under MIT OR Apache-2.0.
Quick start
Rust — cargo add jailguard
use ;
if is_injection
let result = detect;
println!;
Python — pip install jailguard
=
JavaScript / TypeScript — npm install @yfedoseev/jailguard
import { detect, isInjection } from "@yfedoseev/jailguard";
if (isInjection("ignore previous instructions")) {
throw new Error("blocked");
}
const r = detect("What is the capital of France?");
console.log(r.score, r.risk);
Go — go get github.com/yfedoseev/jailguard/go
import jailguard "github.com/yfedoseev/jailguard/go"
if injection, _ := jailguard.IsInjection("ignore previous instructions"); injection
result, _ := jailguard.Detect("What is the capital of France?")
fmt.Printf("score=%.3f risk=%v\n", result.Score, result.Risk)
Elixir — mix.exs
def deps do
[{:jailguard, "~> 0.1.2"}]
end
:ok = JailGuard.download_model()
{:ok, injection?} = JailGuard.is_injection("ignore previous instructions")
if injection?, do: raise("blocked")
{:ok, result} = JailGuard.detect("What is the capital of France?")
IO.inspect({result.score, result.risk})
Precompiled NIFs ship for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64) — no Rust toolchain on install. Set JAILGUARD_BUILD=1 to compile from source on unsupported targets.
The classifier is embedded in every binding. The 90 MB MiniLM ONNX embedder is auto-downloaded to ~/.cache/jailguard/ on first use. For production: call jailguard::download_model() at startup to warm the cache before serving traffic.
JailGuard vs alternatives in 2026
| Feature | JailGuard | Lakera Guard | Rebuff | ProtectAI deberta-v3 | Meta Prompt Guard 2 |
|---|---|---|---|---|---|
| License | Apache 2.0 / MIT | proprietary (Check Point announced acquisition Sep 16, 2025) | Apache 2.0 — archived May 16, 2025 | Apache 2.0 (parent acq. by Palo Alto Jul 22, 2025) | Llama 4 Community (non-OSI) |
| Deployment | embedded library | SaaS API | self-host Python SDK | HF model | HF model |
| Model size | 1.5 MB MLP + 90 MB MiniLM ONNX | n/a (API) | n/a | ~440 MB | 22 M or 86 M params |
| Latency (CPU) | p50 14 ms | ~150–300 ms RTT | n/a | 104–212 ms | 92 ms (A100 GPU)¹ |
| Classification | 8-class taxonomy | binary | binary | binary | binary |
| Active in 2026? | ✅ | ✅ (Check Point pending) | ❌ archived | ✅ (Palo Alto) | ✅ |
| No PyTorch / no runtime dep | ✅ (Rust) | ❌ HTTP client | ❌ Python+OpenAI | ❌ PyTorch | ❌ PyTorch |
| Multi-language bindings | Rust, Python, JS, Go, Elixir | API clients | Python | Python | Python |
¹ Meta does not publish CPU latency for Prompt Guard 2.
Full methodology, dataset breakdown, and head-to-head local-CPU comparisons against protectai/deberta-v3-base-prompt-injection-v2, deepset/deberta-v3-base-injection, and madhurjindal/Jailbreak-Detector-Large are in BENCHMARKS.md.
API at a glance
Python / JS / Go / Elixir expose the same surface in idiomatic form. See docs/API.md for full per-language signatures.
How it works
JailGuard pairs a frozen sentence-embedding model with a small classifier:
- MiniLM-L6-v2 (384-dim, ONNX) produces a semantic vector for the input.
- A 3-layer MLP (384 → 256 → 128 → 1, ~130 K parameters, ReLU + dropout 0.2 + sigmoid) scores it as injection vs. benign.
The embedding model is frozen — no fine-tuning — which keeps training and inference cost on CPU modest. The classifier weights are a 1.5 MB JSON file include_str!'d into the binary at compile time.
┌─────────────────────────────────────────────────────────────┐
│ JAILGUARD DETECTION PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ │
│ User Prompt │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ MiniLM-L6 │ Semantic Embedding (384-dim) │
│ │ (ONNX) │ • Pre-trained by Microsoft │
│ └──────┬──────┘ • Captures meaning, not just keywords │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Binary Classifier (Pure Rust) │ │
│ │ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Dense 256 │→ │ Dense 128 │ │ │
│ │ │ ReLU+Drop │ │ ReLU+Drop │ │ │
│ │ └─────────────┘ └─────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ Sigmoid (0-1) │ │ │
│ │ └─────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Detection Result │
│ • confidence: 0.0 - 1.0 │
│ • is_injection: confidence > 0.5 │
│ • risk: Safe | Low | Medium | High | Critical │
│ │
└─────────────────────────────────────────────────────────────┘
Measurements
Measured on Apple M3, last revalidated 2026-05-03. The pipeline test split is in-distribution (held out from the same 17-source training mix). J1N2 and shalyhinpavel are external datasets, neither used during training.
| Test set | Samples | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|---|
| Pipeline (in-distribution) | 7,049 | 98.40% | 98.56% | 97.98% | 0.983 |
| J1N2 mix (OOD) | 5,000 | 99.38% | 98.09% | 99.94% | 0.990 |
| shalyhinpavel hard-negatives (OOD) | 147 | 89.12% | 76.60% | 87.80% | 0.818 |
Latency (single CPU thread)
| Component | Apple M3 | Intel i5-10210U @ 1.6 GHz¹ |
|---|---|---|
| Embedding (MiniLM ONNX) | ~13 ms | ~36 ms |
| Classification (MLP) | ~1 ms | ~1 ms |
| Total (p50) | ~14 ms | ~37 ms |
| Total (p99) | ~19 ms | ~43 ms |
| Cold start | ~140 ms | ~350 ms |
¹ A 4-year-old low-power Chromebook CPU (Comet Lake-U, 2019, 4c/8t,
running ChromeOS Crostini Linux 6.6). Included to show JailGuard runs
well even on older / weaker hardware. Modern desktop or server CPUs
land closer to the M3 column. Full per-benchmark numbers in
BENCHMARKS.md.
Benchmarks
Reproducible latency and throughput numbers come from three harnesses:
benches/detect.rs— Criterion bench covering single-shotis_injection/detect/scoreand batch throughput atn = 1, 8, 32, 128. Run withcargo bench --bench detect.examples/cold_start_bench.rs— process-startup cost (ONNX session init + first inference). Run withcargo run --release --example cold_start_bench.scripts/bench.sh— portable POSIX wrapper that captures machine metadata (CPU, arch, kernel, toolchain) and emits a single markdown report. Works on Linux x86_64, Linux aarch64, macOS Intel, macOS Apple Silicon, and Chromebook Crostini.
Full methodology and head-to-head local-CPU comparisons in BENCHMARKS.md.
Attack categories covered in training
The classifier output is binary at the public API (injection / benign), but its training mix spans eight attack families:
| Category | Examples |
|---|---|
| Direct injection | "Ignore previous instructions" |
| Jailbreak | DAN, developer-mode prompts |
| Role-play | Persona-based overrides |
| System prompt leak | "Reveal your instructions" |
| Encoding attacks | Base64, ROT13, Unicode obfuscation |
| Context manipulation | Framing and separator tricks |
| Output manipulation | Format coercion |
| Indirect injection | Malicious content embedded in documents |
References
- all-MiniLM-L6-v2 — sentence embeddings
- PromptGuard (Meta)
- Rebuff (archived)
- Sentinel: SOTA model to protect against prompt injections
- Not What You've Signed Up For — indirect injection
Citation
If you use JailGuard in research or production, please cite:
A machine-readable CITATION.cff is also available.
License
Dual-licensed under MIT OR Apache-2.0 at your option.