Crate chemfst

Crate chemfst 

Source
Expand description

§ChemFST

ChemFST is a high-performance chemical name search library using Finite State Transducers (FSTs) to provide efficient searches of systematic and trivial names of chemical compounds in milliseconds. It’s particularly useful for autocomplete features and searching through large chemical databases.

§Features

  • Memory-efficient indexing using Finite State Transducers
  • Extremely fast prefix-based searches (autocomplete)
  • Case-insensitive substring searches
  • Memory-mapped file access for optimal performance

§Example

use chemfst::{build_fst_set, load_fst_set, prefix_search, substring_search};
use std::error::Error;

fn main() -> Result<(), Box<dyn Error>> {
    // Build an FST index from a list of chemical names
    let input_path = "data/chemical_names.txt";
    let fst_path = "data/chemical_names.fst";
    build_fst_set(input_path, fst_path)?;

    // Load the index into memory efficiently
    let set = load_fst_set(fst_path)?;

    // Perform prefix search (autocomplete)
    let prefix_results = prefix_search(&set, "acet", 10);
    println!("Found {} chemicals starting with 'acet'", prefix_results.len());

    // Perform substring search
    let substring_results = substring_search(&set, "enz", 10)?;
    println!("Found {} chemicals containing 'enz'", substring_results.len());

    Ok(())
}

Functions§

build_fst_set
Creates an FST Set from a list of chemical names in a text file.
load_fst_set
Memory maps an FST set from disk.
prefix_search
Performs prefix-based autocomplete search.
preload_fst_set
Forces the operating system to load all pages of the FST into memory.
substring_search
Performs substring search using pattern matching on the FST set.