Structs§
Enums§
Functions§
- analyze_
entropy - compressibility_
class - Classify how compressible content is based on gzip ratio.
- entropy_
compress - entropy_
compress_ adaptive - jaccard_
similarity - kolmogorov_
proxy - Kolmogorov complexity proxy: K(x) ≈ len(gzip(x)) / len(x). Lower values = more compressible = more redundant.
- minhash_
signature - Minhash signature for approximate Jaccard via LSH. Uses k independent hash functions (polynomial hashing with different seeds).
- minhash_
similarity - Approximate Jaccard from two minhash signatures.
- ngram_
jaccard - N-gram Jaccard similarity — preserves word order (unlike word-set Jaccard).
- shannon_
entropy - token_
entropy - Shannon entropy over BPE token IDs (o200k_base). More LLM-relevant than character entropy since LLMs process BPE tokens.