Struct toipe::textgen::RawWordSelector
source · pub struct RawWordSelector<T> { /* private fields */ }
Expand description
Efficient selector of words from a word list.
The word list is given by a BufReader.
§Assumptions
The word list is assumed to:
- Have a list of words separated by newline.
- Use only English alphabet and ASCII.
- Be sorted alphabetically.
- In case-insensitive manner.
- For example, both “Apple” and “apple” must appear before words started with “b”.
- Be a file that is not modified while the object is alive.
- Have no empty lines except at the end of the file.
Note: only words between length 2 and 8, inclusive, are considered. Having no words matching the criteria may lead to an infinite loop.
§Algorithm
During initialization, the RawWordSelector
iterates through all
the words in the list and builds an index mapping each letter (of
the alphabet) to its byte position in the file and the cumulative
number of words present starting with it.
To select a (pesudo-)random word, a random number between 0
(inclusive) and number of lines (exclusive) is generated. Using
binary search, the index of where this number lies in the cumulative
no. of words list is found. Using this index, the byte offset of the
corresponding letter is found. The file is then read starting from
this byte offset and read line-by-line until the correct word (at
line number - cumulative num. words
from the starting of this
letter).
§Time complexity
Initialization: O(n)
Selecting a word: O(1)
(best case) or O(n)
(worst case)
§Space complexity
O(1)
(only needs fixed length arrays).
Implementations§
source§impl<T: Seek + Read> RawWordSelector<T>
impl<T: Seek + Read> RawWordSelector<T>
source§impl RawWordSelector<File>
impl RawWordSelector<File>
source§impl RawWordSelector<Cursor<String>>
impl RawWordSelector<Cursor<String>>
sourcepub fn from_string(word_list: String) -> Result<Self, Error>
pub fn from_string(word_list: String) -> Result<Self, Error>
Create from a String representing the word list file.
Please ensure that assumptions defined at
RawWordSelector
are valid for the contents.