Skip to main content

looks_corrupted

Function looks_corrupted 

Source
pub fn looks_corrupted(text: &str) -> Option<&'static str>
Expand description

Detect output that almost-certainly came from a corrupted provider stream (binary bytes decoded as UTF-8, mojibake from wrong encoding, KV-cache poisoning after timeout/retry, etc.) and should NOT be committed to conversation history.

Returns Some(reason) when the text is corrupted; None when it looks like real model output. Conservative by design: only fires on unambiguously non-textual signals. False positives here would silently drop legitimate responses, which is far worse than letting one garbage turn through.

Trigger context (2026-05-02 datalog evidence): deepseek-v4-flash at ~28K ctx after a successful file write hung 155s on the next turn, then the framework’s stream-timeout retry returned P<ďĎĎĎĎ (UTF-8 bytes 0x50 0x3C 0xC4 0x8F 0xC4 0x8E ×4 — Latin Extended-A mojibake of what was almost certainly raw binary in the provider’s response buffer). Once that string lands in conversation history the next turn sees its own garbage as prior context and the session is unrecoverable.