Crate terraphim_automata

Crate terraphim_automata 

Source
Expand description

Fast text matching and autocomplete engine for knowledge graphs.

terraphim_automata provides high-performance text processing using Aho-Corasick automata and finite state transducers (FST). It powers Terraphim’s autocomplete and knowledge graph linking features.

§Features

  • Fast Autocomplete: Prefix-based search with fuzzy matching (Levenshtein/Jaro-Winkler)
  • Text Matching: Find and replace terms using Aho-Corasick automata
  • Link Generation: Convert matched terms to Markdown, HTML, or Wiki links
  • Paragraph Extraction: Extract context around matched terms
  • WASM Support: Browser-compatible autocomplete with TypeScript bindings
  • Remote Loading: Async loading of thesaurus files from HTTP (feature-gated)

§Architecture

  • Autocomplete Index: FST-based prefix search with metadata
  • Aho-Corasick Matcher: Multi-pattern matching for link generation
  • Thesaurus Builder: Parse knowledge graphs from JSON/Markdown

§Cargo Features

  • remote-loading: Enable async HTTP loading of thesaurus files (requires tokio)
  • tokio-runtime: Add tokio runtime support
  • typescript: Generate TypeScript definitions via tsify
  • wasm: Enable WebAssembly compilation

§Examples

§Autocomplete with Fuzzy Matching

use terraphim_automata::{build_autocomplete_index, fuzzy_autocomplete_search};
use terraphim_types::{Thesaurus, NormalizedTermValue, NormalizedTerm};

// Create a simple thesaurus
let mut thesaurus = Thesaurus::new("programming".to_string());
thesaurus.insert(
    NormalizedTermValue::from("rust"),
    NormalizedTerm::new(1, NormalizedTermValue::from("rust"))
);
thesaurus.insert(
    NormalizedTermValue::from("rust async"),
    NormalizedTerm::new(2, NormalizedTermValue::from("rust async"))
);

// Build autocomplete index
let index = build_autocomplete_index(thesaurus, None).unwrap();

// Fuzzy search (returns Result)
let results = fuzzy_autocomplete_search(&index, "rast", 0.8, Some(5)).unwrap();
assert!(!results.is_empty());
use terraphim_automata::{load_thesaurus_from_json, replace_matches, LinkType};

let json = r#"{
  "name": "test",
  "data": {
    "rust": {
      "id": 1,
      "nterm": "rust programming",
      "url": "https://rust-lang.org"
    }
  }
}"#;

let thesaurus = load_thesaurus_from_json(json).unwrap();
let text = "I love rust!";

// Replace matches with Markdown links
let linked = replace_matches(text, thesaurus, LinkType::MarkdownLinks).unwrap();
let result = String::from_utf8(linked).unwrap();
println!("{}", result); // "I love [rust](https://rust-lang.org)!"

§Loading Thesaurus Files

use terraphim_automata::{AutomataPath, load_thesaurus};

// Load from local file
let local_path = AutomataPath::from_local("thesaurus.json");
let thesaurus = load_thesaurus(&local_path).await.unwrap();

// Load from remote URL (requires 'remote-loading' feature)
let remote_path = AutomataPath::from_remote("https://example.com/thesaurus.json").unwrap();
let thesaurus = load_thesaurus(&remote_path).await.unwrap();

§WASM Support

Build for WebAssembly:

wasm-pack build --target web --features wasm

See the WASM example for browser usage.

Re-exports§

pub use self::builder::Logseq;
pub use self::builder::ThesaurusBuilder;
pub use autocomplete::build_autocomplete_index;
pub use autocomplete::deserialize_autocomplete_index;
pub use autocomplete::fuzzy_autocomplete_search_levenshtein;
pub use autocomplete::serialize_autocomplete_index;
pub use autocomplete::AutocompleteConfig;
pub use autocomplete::AutocompleteIndex;
pub use autocomplete::AutocompleteMetadata;
pub use autocomplete::AutocompleteResult;
pub use matcher::extract_paragraphs_from_automata;
pub use matcher::find_matches;
pub use matcher::replace_matches;
pub use matcher::LinkType;
pub use matcher::Matched;

Modules§

autocomplete
autocomplete_helpers
builder
matcher
url_protector
URL protection for text replacement.

Enums§

AutomataPath
Path to a thesaurus/automata file, either local or remote.
TerraphimAutomataError
Errors that can occur when working with automata and thesaurus operations.

Functions§

load_thesaurus
Load a thesaurus from a local file only (WASM-compatible version)
load_thesaurus_from_json
Load thesaurus from JSON string (sync version for WASM compatibility)
load_thesaurus_from_json_and_replace
Load thesaurus from JSON string and replace terms using streaming matcher

Type Aliases§

Result
Result type alias using TerraphimAutomataError.