Expand description
blitztext
High-performance keyword extraction and replacement in strings.
BlitzText is a Rust library for efficient keyword processing, based on the FlashText and Aho-Corasick algorithm. It’s designed for high-speed operations on large volumes of text.
§Overview
The main component of BlitzText is the KeywordProcessor
, which manages a trie-based data structure for storing and matching keywords. It supports various operations including keyword addition, extraction, and replacement, with options for case sensitivity, fuzzy matching, and parallel processing.
§Structs
§KeywordProcessor
The core struct for keyword operations.
§Methods
new
- Creates a newKeywordProcessor
with default settings.with_options
- Creates a newKeywordProcessor
with specified options.add_keyword
- Adds a keyword to the processor.remove_keyword
- Removes a keyword from the processor.extract_keywords
- Extracts keywords from the given text.replace_keywords
- Replaces keywords in the given text.parallel_extract_keywords_from_texts
- Extracts keywords from multiple texts in parallel.
§KeywordMatch
Represents a matched keyword in the text.
§Fields
keyword: &str
- The matched keyword.similarity: f32
- The similarity score of the match (1.0 for exact matches).start: usize
- The start index of the match in the text.end: usize
- The end index of the match in the text.
§Examples
§Basic Usage
use blitztext::KeywordProcessor;
fn main() {
let mut processor = KeywordProcessor::new();
processor.add_keyword("rust", Some("Rust Lang"));
processor.add_keyword("programming", Some("Coding"));
let text = "I love rust programming";
let matches = processor.extract_keywords(text, None);
for m in matches {
println!("Found '{}' at [{}, {}]", m.keyword, m.start, m.end);
}
let replaced = processor.replace_keywords(text, None);
println!("Replaced text: {}", replaced);
// Output: "I love Rust Lang Coding"
}
§Fuzzy Matching
let matches = processor.extract_keywords(text, Some(0.8));
§Parallel Processing
let texts = vec!["Text 1", "Text 2", "Text 3"];
let results = processor.parallel_extract_keywords_from_texts(&texts, None);
§Custom Non-Word Boundaries
use blitztext::KeywordProcessor;
let mut processor = KeywordProcessor::new();
processor.add_keyword("rust", None);
processor.add_keyword("programming", Some("coding"));
let text = "I-love-rust-programming-and-1coding2";
// Default behavior: '-' is a word separator
let matches = processor.extract_keywords(text, None);
assert_eq!(matches.len(), 2);
// Add '-' as a non-word boundary
processor.add_non_word_boundary('-');
// Now '-' is considered part of words
let matches = processor.extract_keywords(text, None);
assert_eq!(matches.len(), 0);
§Performance
BlitzText is optimized for high-speed operations, making it suitable for processing large volumes of text. It uses a trie-based data structure for efficient matching and supports parallel processing for handling multiple texts simultaneously.
Structs§
- CharSet
- A fast set implementation for characters.
- Keyword
Match - A struct representing a keyword match in a text.
- Keyword
Processor - A struct for handling keyword matching, fuzzy matching, and replacement in text.