Expand description
Content classification for clipboard input.
Returns Text, Url, or Code. The caller must have already ruled out
image bytes via magic-byte sniffing — this module never returns Image.
Input is raw bytes: non-UTF-8 input short-circuits to Text, and input
over MAX_CLASSIFY_BYTES short-circuits to Text without any UTF-8
validation. This keeps cinch push <huge-text> (20 MB stdin) cheap —
no O(n) UTF-8 walk before bailing.
Decision order (first match wins):
-
64 KB bytes → Text (no UTF-8 scan)
- invalid UTF-8 → Text
- trim; empty → Text
- shebang
#!/...→ Code - whole-string URL parse with scheme allow-list → Url
{...}/[...]shape + valid JSON → Code- any line starts with a code-opener keyword → Code
- symbol-to-alphanumeric ratio > 0.20 with at least one code bigram → Code
- ≥ 2 distinct code bigrams → Code
- indented line(s) with a code bigram → Code
- otherwise → Text
Functions§
- detect
- Classify a clip from raw bytes. Never returns
Image.