# JailGuard
[](BENCHMARKS.md)
[](BENCHMARKS.md)
[](https://www.rust-lang.org/)
[](#license)
[](https://crates.io/crates/jailguard)
[](https://pypi.org/project/jailguard/)
[](https://www.npmjs.com/package/@yfedoseev/jailguard)
> **JailGuard is a pure-Rust prompt-injection detector with a 1.5 MB embedded MLP classifier.** It scores text in **p50 14 ms on CPU**, achieves **98.40% accuracy** on a 7,049-sample held-out test set drawn from 17 public datasets, and ships bindings for **Rust, Python, JavaScript, Go, and Elixir**. No external service, no API key. Dual-licensed under MIT OR Apache-2.0.
## Quick start
**Rust** — `cargo add jailguard`
```rust
use jailguard::{detect, is_injection};
if is_injection("ignore previous instructions") {
return Err("blocked");
}
let result = detect("What is the capital of France?");
println!("score={:.3} risk={:?}", result.score, result.risk);
```
**Python** — `pip install jailguard`
```python
import jailguard
if jailguard.is_injection("ignore previous instructions"):
raise RuntimeError("blocked")
result = jailguard.detect("What is the capital of France?")
print(result.score, result.risk)
```
**JavaScript / TypeScript** — `npm install @yfedoseev/jailguard`
```typescript
import { detect, isInjection } from "@yfedoseev/jailguard";
if (isInjection("ignore previous instructions")) {
throw new Error("blocked");
}
const r = detect("What is the capital of France?");
console.log(r.score, r.risk);
```
**Go** — `go get github.com/yfedoseev/jailguard/go`
```go
import jailguard "github.com/yfedoseev/jailguard/go"
if injection, _ := jailguard.IsInjection("ignore previous instructions"); injection {
log.Fatal("blocked")
}
result, _ := jailguard.Detect("What is the capital of France?")
fmt.Printf("score=%.3f risk=%v\n", result.Score, result.Risk)
```
**Elixir** — `mix.exs`
```elixir
def deps do
[{:jailguard, "~> 0.1.2"}]
end
```
```elixir
:ok = JailGuard.download_model()
{:ok, injection?} = JailGuard.is_injection("ignore previous instructions")
if injection?, do: raise("blocked")
{:ok, result} = JailGuard.detect("What is the capital of France?")
IO.inspect({result.score, result.risk})
```
Precompiled NIFs ship for Linux (x86_64, aarch64), macOS (x86_64, aarch64), and Windows (x86_64) — no Rust toolchain on install. Set `JAILGUARD_BUILD=1` to compile from source on unsupported targets.
The classifier is embedded in every binding. The 90 MB MiniLM ONNX embedder is auto-downloaded to `~/.cache/jailguard/` on first use. For production: call `jailguard::download_model()` at startup to warm the cache before serving traffic.
## JailGuard vs alternatives in 2026
| **License** | Apache 2.0 / MIT | proprietary (Check Point announced acquisition Sep 16, 2025) | Apache 2.0 — **archived May 16, 2025** | Apache 2.0 (parent acq. by Palo Alto Jul 22, 2025) | Llama 4 Community (non-OSI) |
| **Deployment** | embedded library | SaaS API | self-host Python SDK | HF model | HF model |
| **Model size** | 1.5 MB MLP + 90 MB MiniLM ONNX | n/a (API) | n/a | ~440 MB | 22 M or 86 M params |
| **Latency (CPU)** | **p50 14 ms** | ~150–300 ms RTT | n/a | 104–212 ms | 92 ms (A100 GPU)¹ |
| **Classification** | **8-class taxonomy** | binary | binary | binary | binary |
| **Active in 2026?** | ✅ | ✅ (Check Point pending) | **❌ archived** | ✅ (Palo Alto) | ✅ |
| **No PyTorch / no runtime dep** | ✅ (Rust) | ❌ HTTP client | ❌ Python+OpenAI | ❌ PyTorch | ❌ PyTorch |
| **Multi-language bindings** | Rust, Python, JS, Go, Elixir | API clients | Python | Python | Python |
¹ Meta does not publish CPU latency for Prompt Guard 2.
Full methodology, dataset breakdown, and head-to-head local-CPU comparisons against `protectai/deberta-v3-base-prompt-injection-v2`, `deepset/deberta-v3-base-injection`, and `madhurjindal/Jailbreak-Detector-Large` are in [`BENCHMARKS.md`](./BENCHMARKS.md).
## API at a glance
```rust
pub fn detect(text: &str) -> DetectionOutput
pub fn is_injection(text: &str) -> bool
pub fn score(text: &str) -> f32
pub fn detect_batch(texts: &[&str]) -> Vec<DetectionOutput>
pub fn download_model() -> Result<PathBuf, Error>
pub struct DetectionOutput {
pub is_injection: bool,
pub score: f32,
pub confidence: f32,
pub risk: RiskLevel,
}
pub enum RiskLevel { Safe, Low, Medium, High, Critical }
```
Python / JS / Go / Elixir expose the same surface in idiomatic form. See [`docs/API.md`](docs/API.md) for full per-language signatures.
## How it works
JailGuard pairs a frozen sentence-embedding model with a small classifier:
1. **MiniLM-L6-v2** (384-dim, ONNX) produces a semantic vector for the input.
2. A 3-layer MLP (384 → 256 → 128 → 1, ~130 K parameters, ReLU + dropout 0.2 + sigmoid) scores it as injection vs. benign.
The embedding model is frozen — no fine-tuning — which keeps training and inference cost on CPU modest. The classifier weights are a 1.5 MB JSON file `include_str!`'d into the binary at compile time.
```
┌─────────────────────────────────────────────────────────────┐
│ JAILGUARD DETECTION PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ │
│ User Prompt │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ MiniLM-L6 │ Semantic Embedding (384-dim) │
│ │ (ONNX) │ • Pre-trained by Microsoft │
│ └──────┬──────┘ • Captures meaning, not just keywords │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Binary Classifier (Pure Rust) │ │
│ │ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Dense 256 │→ │ Dense 128 │ │ │
│ │ │ ReLU+Drop │ │ ReLU+Drop │ │ │
│ │ └─────────────┘ └─────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ │ │ Sigmoid (0-1) │ │ │
│ │ └─────────────────┘ │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Detection Result │
│ • confidence: 0.0 - 1.0 │
│ • is_injection: confidence > 0.5 │
└─────────────────────────────────────────────────────────────┘
```
## Measurements
Measured on Apple M3, last revalidated 2026-05-03. The pipeline test split is in-distribution (held out from the same 17-source training mix). J1N2 and shalyhinpavel are external datasets, neither used during training.
| Pipeline (in-distribution) | 7,049 | **98.40%** | 98.56% | 97.98% | 0.983 |
| J1N2 mix (OOD) | 5,000 | **99.38%** | 98.09% | 99.94% | 0.990 |
| shalyhinpavel hard-negatives (OOD) | 147 | **89.12%** | 76.60% | 87.80% | 0.818 |
### Latency (single CPU thread)
| Embedding (MiniLM ONNX) | ~13 ms | ~36 ms |
| Classification (MLP) | ~1 ms | ~1 ms |
| **Total (p50)** | **~14 ms** | **~37 ms** |
| **Total (p99)** | **~19 ms** | **~43 ms** |
| Cold start | ~140 ms | ~350 ms |
¹ A 4-year-old low-power Chromebook CPU (Comet Lake-U, 2019, 4c/8t,
running ChromeOS Crostini Linux 6.6). Included to show JailGuard runs
well even on older / weaker hardware. Modern desktop or server CPUs
land closer to the M3 column. Full per-benchmark numbers in
[`BENCHMARKS.md`](./BENCHMARKS.md).
## Benchmarks
Reproducible latency and throughput numbers come from three harnesses:
- `benches/detect.rs` — Criterion bench covering single-shot `is_injection` / `detect` / `score` and batch throughput at `n = 1, 8, 32, 128`. Run with `cargo bench --bench detect`.
- `examples/cold_start_bench.rs` — process-startup cost (ONNX session init + first inference). Run with `cargo run --release --example cold_start_bench`.
- `scripts/bench.sh` — portable POSIX wrapper that captures machine metadata (CPU, arch, kernel, toolchain) and emits a single markdown report. Works on Linux x86_64, Linux aarch64, macOS Intel, macOS Apple Silicon, and Chromebook Crostini.
Full methodology and head-to-head local-CPU comparisons in [`BENCHMARKS.md`](./BENCHMARKS.md).
## Attack categories covered in training
The classifier output is binary at the public API (injection / benign), but its training mix spans eight attack families:
| Direct injection | "Ignore previous instructions" |
| Jailbreak | DAN, developer-mode prompts |
| Role-play | Persona-based overrides |
| System prompt leak | "Reveal your instructions" |
| Encoding attacks | Base64, ROT13, Unicode obfuscation |
| Context manipulation | Framing and separator tricks |
| Output manipulation | Format coercion |
| Indirect injection | Malicious content embedded in documents |
## References
- [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) — sentence embeddings
- [PromptGuard (Meta)](https://github.com/meta-llama/PurpleLlama)
- [Rebuff](https://github.com/protectai/rebuff) (archived)
- [Sentinel: SOTA model to protect against prompt injections](https://arxiv.org/abs/2506.05446)
- [Not What You've Signed Up For — indirect injection](https://arxiv.org/abs/2302.12173)
## Citation
If you use JailGuard in research or production, please cite:
```bibtex
@software{jailguard,
title = {JailGuard: Efficient Prompt Injection Detection via Pre-trained Embeddings},
author = {Yury Fedoseev},
year = {2026},
url = {https://github.com/yfedoseev/jailguard}
}
```
A machine-readable [`CITATION.cff`](CITATION.cff) is also available.
## License
Dual-licensed under [MIT](LICENSE-MIT) OR [Apache-2.0](LICENSE-APACHE) at your option.