collection of structs mediating between words and tries and representing
intermediate states in the discovery of anagrams
CharCount | The fundamental representation of the undigested bit of a phrase in
anagram calculation, a CharCount keeps track of the characters still
looking for a foster word. To accelerate processing, they also cache
the first character offset with a non-zero count, the last such offset,
the sum of their counts, and a checksum sufficient for hashing and
identification.
|
CharSet | A set-ish representation of the characters in a CharCount . A CharSet
is a record of the types of characters present without regard to their
count.
|
ToDo | The representation of a partially processed phrase working its way through
anagram discovery. ToDo s are a linked list keeping track of words already
found plus a CharCount keeping track of the characters yet to be
processed.
|
Translator | A Translator converts between alphabetic and numeric representations of
words. For anagram calculation words are treated as pure numeric sequences.
The translator converts back and forth and also keeps track of character
frequences in order to produce a dense trie representation of a word list.
|
normalize | A function that strips away characters of no interest -- spaces and
punctuation characters, generally -- and removes unimportant distinctions
like case. If one wishes to convert this code to a new alphabet this is
likely the only things that needs fixing.
|