Skip to main content

Module classify

Module classify 

Source
Expand description

Content classification for clipboard input.

Returns Text, Url, or Code. The caller must have already ruled out image bytes via magic-byte sniffing — this module never returns Image.

Input is raw bytes: non-UTF-8 input short-circuits to Text, and input over MAX_CLASSIFY_BYTES short-circuits to Text without any UTF-8 validation. This keeps cinch push <huge-text> (20 MB stdin) cheap — no O(n) UTF-8 walk before bailing.

Decision order (first match wins):

  1. 64 KB bytes → Text (no UTF-8 scan)

  2. invalid UTF-8 → Text
  3. trim; empty → Text
  4. shebang #!/... → Code
  5. whole-string URL parse with scheme allow-list → Url
  6. {...} / [...] shape + valid JSON → Code
  7. any line starts with a code-opener keyword → Code
  8. symbol-to-alphanumeric ratio > 0.20 with at least one code bigram → Code
  9. ≥ 2 distinct code bigrams → Code
  10. indented line(s) with a code bigram → Code
  11. otherwise → Text

Functions§

detect
Classify a clip from raw bytes. Never returns Image.