Expand description
Write-time secret detection gate (issue #76).
Scans caller-supplied content strings before any storage write. A match
causes a hard RuntimeError::SecretDetected that names the detector and
carries a masked excerpt — it never echoes the full candidate back.
Scope: credentials only — API keys, tokens, private keys, passwords, and connection strings with embedded credentials. General PII such as email addresses, phone numbers, and company names is intentionally NOT blocked; those are normal knowledge-graph content.
Detection is layered, cheap-first:
- Known-prefix / known-shape patterns — AWS AKIA/ASIA, GitHub tokens,
OpenAI
sk-proj-, Anthropicsk-ant-, Stripe live keys, Fly.io tokens, Vercel secrets, Slackxox*, JWT triples, PEM private-key headers, Age secret keys, URL userinfo (scheme://user:pass@). Baresk-is also checked but only when NOT followed by a known safe word boundary (e.g.sk-learn,sk-image). - High-entropy token heuristic — base64/hex/base64url runs ≥ 24 chars
near a trigger word (key, secret, password, credential, bearer, auth,
apikey, api_key, access_key, private_key). The word
tokenalone is NOT a trigger to avoid blockingtokenizer_*,token_count, etc.
Allowlist (false-positive suppression):
- Pure hex strings (sha256, git SHA) — passed unconditionally.
- UUID canonical form (
xxxxxxxx-xxxx-…) — passed. - Base64/base64url content hashes with an explicit
sha<N>-prefix (SRI hashes, npm lockfile integrity) — passed when not preceded by a known-vendor prefix. Bare base64 tokens without thesha<N>-prefix are NOT passed. - Strings that are entirely ASCII punctuation/whitespace (e.g. code) — not subject to the entropy heuristic, only the literal-prefix checks apply.
Structs§
- Secret
Match - Returned when a write would store credential-looking content.
Functions§
- check
- Hard-block content from being written.
- check_
json - Recursively scan a JSON value for credential-shaped strings.
- check_
tags - Scan a string-tagged slice (entity/note tags).