Skip to main content

Module entropy

Module entropy 

Source

Structs§

EntropyAnalysis
EntropyResult

Enums§

CompressibilityClass

Functions§

analyze_entropy
compressibility_class
Classify how compressible content is based on gzip ratio.
entropy_compress
entropy_compress_adaptive
jaccard_similarity
kolmogorov_proxy
Kolmogorov complexity proxy: K(x) ≈ len(gzip(x)) / len(x). Lower values = more compressible = more redundant.
minhash_signature
Minhash signature for approximate Jaccard via LSH. Uses k independent hash functions (polynomial hashing with different seeds).
minhash_similarity
Approximate Jaccard from two minhash signatures.
ngram_jaccard
N-gram Jaccard similarity — preserves word order (unlike word-set Jaccard).
shannon_entropy
token_entropy
Shannon entropy over BPE token IDs (o200k_base). More LLM-relevant than character entropy since LLMs process BPE tokens.