Expand description
Ferret is a copy-detection tool, locating duplicate text or code in multiple text documents or source files. Ferret is designed to detect copying (collusion) within a given set of files.
As a library, Ferret can be used to analyse program code or natural language texts into trigrams, and compare pairs of documents for similarity.
Modulesยง
- chardrip
- Provides access to a stream of characters, one-by-one.
- documents
- Manages a collection of files and their analysis.
- tokenisers
- Defines token readers for different kinds of text or code.
- trigram_
reader - Splits a given text file into a sequence of trigrams.