zeph-sanitizer
Content sanitization, exfiltration guard, PII filtering, and quarantine for Zeph — untrusted input isolation before LLM context injection.
Overview
Implements a multi-stage security pipeline that processes all external data before it enters the LLM context window. The pipeline detects prompt injection patterns, wraps content in spotlighting XML delimiters, optionally routes high-risk sources through an isolated quarantine LLM call, and guards outbound paths against data exfiltration. Memory retrieval sources are classified via MemorySourceHint to suppress false positive injection flags on recalled user conversations and LLM-generated summaries.
[!NOTE] This crate is marked
publish = false. It is an internal workspace crate not published to crates.io.
Key types
| Type | Description |
|---|---|
ContentSanitizer |
4-step pipeline: truncate → strip control chars → detect injections → spotlighting XML wrap |
TrustLevel |
Trusted / LocalUntrusted / ExternalUntrusted |
ContentSourceKind |
Source category (tool output, web scrape, document, etc.) |
SanitizedContent |
Output with injection flag list and wrapped content |
InjectionFlag |
Detected injection pattern with matched text |
QuarantinedSummarizer |
Dual LLM pattern — routes high-risk content through an isolated, tool-less LLM call |
ExfiltrationGuard |
Three outbound guards: markdown image tracking, tool URL cross-validation, memory write suppression |
ContentSource |
Source metadata with ContentSourceKind and optional MemorySourceHint for memory retrieval classification |
MemorySourceHint |
ConversationHistory / LlmSummary / ExternalDocument — classifies memory retrieval sources to suppress false positive injection flags on recalled user text and LLM-generated summaries |
Sanitization pipeline
External data
↓ 1. Truncate to max_content_size
↓ 2. Strip null bytes and control characters
↓ 3. Detect 17 injection patterns (OWASP variants + encoding)
↓ 4. Wrap in spotlighting XML delimiters
<tool-output>…</tool-output> (local sources)
<external-data>…</external-data> (external sources)
Usage
use ;
let sanitizer = from_config;
let result = sanitizer.sanitize?;
// result.content contains the wrapped, injection-cleaned text
// result.injection_flags contains any detected patterns
for flag in &result.injection_flags
Configuration
[]
= true
= 65536 # bytes; content truncated before injection detection
[]
= true
= ["web_scrape", "document"] # source kinds routed through quarantine
= "claude-haiku-4-5-20251001" # optional; defaults to primary provider
= 2048
[]
= true
= true
= true
= true
Features
| Feature | Description |
|---|---|
guardrail |
Activates advanced guardrail checks in the sanitization pipeline |
Security metrics
ContentSanitizer exposes metrics via the shared MetricsSnapshot:
| Metric | Description |
|---|---|
sanitizer_runs |
Total sanitization invocations |
sanitizer_injection_flags |
Cumulative injection pattern detections |
sanitizer_truncations |
Content truncations applied |
quarantine_invocations |
Quarantine LLM calls triggered |
quarantine_failures |
Quarantine LLM call failures (falls back to direct sanitization) |
exfiltration_images_blocked |
Markdown image pixel-tracking attempts blocked |
exfiltration_tool_urls_flagged |
Tool URLs cross-validated against untrusted sources |
exfiltration_memory_guards |
Memory write suppression events |
Installation
This crate is a workspace-internal dependency. Reference it from another workspace crate:
[]
= { = true }
Documentation
Full documentation: https://bug-ops.github.io/zeph/
License
MIT