Expand description
§ChemFST
ChemFST is a high-performance chemical name search library using Finite State Transducers (FSTs)
to provide efficient searches of systematic and trivial names of chemical compounds in milliseconds.
It’s particularly useful for autocomplete features and searching through large chemical databases.
§Features
- Memory-efficient indexing using Finite State Transducers
- Extremely fast prefix-based searches (autocomplete)
- Case-insensitive substring searches
- Memory-mapped file access for optimal performance
§Example
use chemfst::{build_fst_set, load_fst_set, prefix_search, substring_search};
use std::error::Error;
fn main() -> Result<(), Box<dyn Error>> {
// Build an FST index from a list of chemical names
let input_path = "data/chemical_names.txt";
let fst_path = "data/chemical_names.fst";
build_fst_set(input_path, fst_path)?;
// Load the index into memory efficiently
let set = load_fst_set(fst_path)?;
// Perform prefix search (autocomplete)
let prefix_results = prefix_search(&set, "acet", 10);
println!("Found {} chemicals starting with 'acet'", prefix_results.len());
// Perform substring search
let substring_results = substring_search(&set, "enz", 10)?;
println!("Found {} chemicals containing 'enz'", substring_results.len());
Ok(())
}Functions§
- build_
fst_ set - Creates an FST Set from a list of chemical names in a text file.
- load_
fst_ set - Memory maps an FST set from disk.
- prefix_
search - Performs prefix-based autocomplete search.
- preload_
fst_ set - Forces the operating system to load all pages of the FST into memory.
- substring_
search - Performs substring search using pattern matching on the FST set.