Skip to main content

Module codebook

Module codebook 

Source

Structs§

Codebook
CodebookEntry
Cross-file semantic deduplication via TF-IDF codebook.

Functions§

find_semantic_duplicates
Identify semantically duplicate blocks across files. Returns pairs of (file_a, file_b, similarity) where similarity > threshold.
tfidf_cosine_similarity
Cosine similarity between two documents using TF-IDF vectors. Used for embedding-space deduplication approximation.