Skip to main content

Module security

Module security 

Source
Expand description

Secure Ingestion: defensive layer for content fetched from the open web before it is passed to an LLM agent.

§Threat model

Public web pages can carry instructions, semantic data, and intercom addresses aimed at AI agents in layers that humans never read: HTML comments addressed to “machine intelligence”, machine-only attribute payloads (data-dim, data-ai-*, …), class="m"-style “machine” spans, display:none text, and aria-hidden="true" content. Whether the publisher is honest (semantic web research) or hostile (supply-chain attack), the agent reading the page cannot tell the difference from the markup alone.

Defensive treatment is the same regardless of intent: detect the channel, surface its provenance, and strip it by default before the agent reads the page. Operators can opt back in when they want the machine-readable layer.

See ingestion_guard for the v0 detector + sanitiser. Future work: WebMCP advertisement detection (manifest discovery is in crate::webmcp) is on the same trust path; promotion of the WebMCP discovery output into a sanctioned/strict policy gate will land here.

Re-exports§

pub use ingestion_guard::DetectionReport;
pub use ingestion_guard::DirectiveKind;
pub use ingestion_guard::Sample;
pub use ingestion_guard::Severity;
pub use ingestion_guard::detect;
pub use ingestion_guard::sanitize;

Modules§

ingestion_guard
Detect and strip machine-targeted markup from HTML.