Classify the token boundaries as detected by find_boundaries
as
either weak, normal or hard boundaries. This information determines
how eager the system is to split on certain boundaries.
Given a text string, identify at what points token boundaries
occur, for instance between alphabetic characters and punctuation.
The text string always ends with a boundary (but it may be a dummy one that covers no length).
Find all ngrams in the text of the specified order, respecting the boundaries.
This will return a vector of Match instances, referring to the precise (untokenised) text.
A redundant match is a higher order match which already scores a perfect distance score when its unigram
components are considered separately.