Module secret_gate

Expand description

Write-time secret detection gate (issue #76).

Scans caller-supplied content strings before any storage write. A match causes a hard RuntimeError::SecretDetected that names the detector and carries a masked excerpt — it never echoes the full candidate back.

Scope: credentials only — API keys, tokens, private keys, passwords, and connection strings with embedded credentials. General PII such as email addresses, phone numbers, and company names is intentionally NOT blocked; those are normal knowledge-graph content.

Detection is layered, cheap-first:

Known-prefix / known-shape patterns — AWS AKIA/ASIA, GitHub tokens, OpenAI sk-proj-, Anthropic sk-ant-, Stripe live keys, Fly.io tokens, Vercel secrets, Slack xox*, JWT triples, PEM private-key headers, Age secret keys, URL userinfo (scheme://user:pass@). Bare sk- is also checked but only when NOT followed by a known safe word boundary (e.g. sk-learn, sk-image).
High-entropy token heuristic — base64/hex/base64url runs ≥ 24 chars near a trigger word (key, secret, password, credential, bearer, auth, apikey, api_key, access_key, private_key). The word token alone is NOT a trigger to avoid blocking tokenizer_*, token_count, etc.

Allowlist (false-positive suppression):

Pure hex strings (sha256, git SHA) — passed unconditionally.
UUID canonical form (xxxxxxxx-xxxx-…) — passed.
Base64/base64url content hashes with an explicit sha<N>- prefix (SRI hashes, npm lockfile integrity) — passed when not preceded by a known-vendor prefix. Bare base64 tokens without the sha<N>- prefix are NOT passed.
Strings that are entirely ASCII punctuation/whitespace (e.g. code) — not subject to the entropy heuristic, only the literal-prefix checks apply.

Structs§

SecretMatch: Returned when a write would store credential-looking content.

Functions§

check: Hard-block content from being written.
check_json: Recursively scan a JSON value for credential-shaped strings.
check_tags: Scan a string-tagged slice (entity/note tags).