Crate ferret

Source
Expand description

Ferret is a copy-detection tool, locating duplicate text or code in multiple text documents or source files. Ferret is designed to detect copying (collusion) within a given set of files.

As a library, Ferret can be used to analyse program code or natural language texts into trigrams, and compare pairs of documents for similarity.

Modulesยง

chardrip
Provides access to a stream of characters, one-by-one.
documents
Manages a collection of files and their analysis.
tokenisers
Defines token readers for different kinds of text or code.
trigram_reader
Splits a given text file into a sequence of trigrams.