flashtext2-rs
Flashtext implementation in Rust
Flashtext2
This crate allows you to extract & replace strings very efficiently, and with better performance than using RegEx.
Its especially performant when you have a have a very big list of keywords that you want to extract from your text, and also for replace many values.
How it works
The flashtext algorithm uses a trie to save all the
keywords the user wants to extract, a keyword is defined a sequence of tokens,
for example "Hello world!"
becomes: ["Hello", " ", "world", "!"]
(the tokens are split using the Unicode Standard Annex #29).
And in this implementation, each node in the trie contains one token (not character!).
Time complexity
The time complexity of this algorithm is not related to the number of keywords in the trie, but only by the length of the document!
Quick start
use KeywordProcessor;
Case insensitive
At the moment this crate doesn't support case-insensitive search, although its something I want
to add in the future.
As a workaround you can normalize the text by calling str::to_lowercase()
when inserting the
keywords and also on the text you want to search, i.e.:
use KeywordProcessor;