Structs§
- Codebook
- Codebook
Entry - Cross-file semantic deduplication via TF-IDF codebook.
Functions§
- find_
semantic_ duplicates - Identify semantically duplicate blocks across files. Returns pairs of (file_a, file_b, similarity) where similarity > threshold.
- tfidf_
cosine_ similarity - Cosine similarity between two documents using TF-IDF vectors. Used for embedding-space deduplication approximation.