Crate blitztext

Source
Expand description

blitztext

High-performance keyword extraction and replacement in strings.

BlitzText is a Rust library for efficient keyword processing, based on the FlashText and Aho-Corasick algorithm. It’s designed for high-speed operations on large volumes of text.

§Overview

The main component of BlitzText is the KeywordProcessor, which manages a trie-based data structure for storing and matching keywords. It supports various operations including keyword addition, extraction, and replacement, with options for case sensitivity, fuzzy matching, and parallel processing.

§Structs

§KeywordProcessor

The core struct for keyword operations.

§Methods
  • new - Creates a new KeywordProcessor with default settings.
  • with_options - Creates a new KeywordProcessor with specified options.
  • add_keyword - Adds a keyword to the processor.
  • remove_keyword - Removes a keyword from the processor.
  • extract_keywords - Extracts keywords from the given text.
  • replace_keywords - Replaces keywords in the given text.
  • parallel_extract_keywords_from_texts - Extracts keywords from multiple texts in parallel.

§KeywordMatch

Represents a matched keyword in the text.

§Fields
  • keyword: &str - The matched keyword.
  • similarity: f32 - The similarity score of the match (1.0 for exact matches).
  • start: usize - The start index of the match in the text.
  • end: usize - The end index of the match in the text.

§Examples

§Basic Usage

use blitztext::KeywordProcessor;

fn main() {
    let mut processor = KeywordProcessor::new();
    processor.add_keyword("rust", Some("Rust Lang"));
    processor.add_keyword("programming", Some("Coding"));

    let text = "I love rust programming";
    let matches = processor.extract_keywords(text, None);

    for m in matches {
        println!("Found '{}' at [{}, {}]", m.keyword, m.start, m.end);
    }

    let replaced = processor.replace_keywords(text, None);
    println!("Replaced text: {}", replaced);
    // Output: "I love Rust Lang Coding"
}

§Fuzzy Matching

let matches = processor.extract_keywords(text, Some(0.8));

§Parallel Processing

let texts = vec!["Text 1", "Text 2", "Text 3"];
let results = processor.parallel_extract_keywords_from_texts(&texts, None);

§Custom Non-Word Boundaries

use blitztext::KeywordProcessor;
let mut processor = KeywordProcessor::new();
processor.add_keyword("rust", None);
processor.add_keyword("programming", Some("coding"));

let text = "I-love-rust-programming-and-1coding2";

// Default behavior: '-' is a word separator
let matches = processor.extract_keywords(text, None);
assert_eq!(matches.len(), 2);

// Add '-' as a non-word boundary
processor.add_non_word_boundary('-');

// Now '-' is considered part of words
let matches = processor.extract_keywords(text, None);
assert_eq!(matches.len(), 0);

§Performance

BlitzText is optimized for high-speed operations, making it suitable for processing large volumes of text. It uses a trie-based data structure for efficient matching and supports parallel processing for handling multiple texts simultaneously.

Structs§

CharSet
A fast set implementation for characters.
KeywordMatch
A struct representing a keyword match in a text.
KeywordProcessor
A struct for handling keyword matching, fuzzy matching, and replacement in text.