pub struct FastText { /* private fields */ }Expand description
FastText model for learning word representations with character n-grams
Decomposes each word into character n-grams (subwords) and learns embeddings for both whole words and their constituent n-grams. This enables:
- Out-of-vocabulary word handling (any word can be represented via its n-grams)
- Morphological awareness (similar prefixes/suffixes produce similar vectors)
- Robustness to misspellings and rare word forms
Implementations§
Source§impl FastText
impl FastText
Sourcepub fn with_config(config: FastTextConfig) -> Self
pub fn with_config(config: FastTextConfig) -> Self
Create a new FastText model with custom configuration
Sourcepub fn with_tokenizer(self, tokenizer: Box<dyn Tokenizer + Send + Sync>) -> Self
pub fn with_tokenizer(self, tokenizer: Box<dyn Tokenizer + Send + Sync>) -> Self
Set a custom tokenizer
Sourcepub fn extract_ngrams(&self, word: &str) -> Vec<String>
pub fn extract_ngrams(&self, word: &str) -> Vec<String>
Extract character n-grams from a word
Wraps the word with boundary markers < and > before extracting.
For example, “fox” with min_n=3, max_n=4 produces:
3-grams: “<fo”, “fox”, “ox>”
4-grams: “<fox”, “fox>”, “<fox>”(if len allows)
Sourcepub fn build_vocabulary(&mut self, texts: &[&str]) -> Result<()>
pub fn build_vocabulary(&mut self, texts: &[&str]) -> Result<()>
Build vocabulary from texts
Sourcepub fn get_word_vector(&self, word: &str) -> Result<Array1<f64>>
pub fn get_word_vector(&self, word: &str) -> Result<Array1<f64>>
Get the embedding vector for a word (handles OOV words via subwords)
For in-vocabulary words, returns the average of the word vector and its n-gram vectors. For OOV words, returns the average of matching n-gram vectors.
Sourcepub fn most_similar(
&self,
word: &str,
top_n: usize,
) -> Result<Vec<(String, f64)>>
pub fn most_similar( &self, word: &str, top_n: usize, ) -> Result<Vec<(String, f64)>>
Find most similar words to a given word
Sourcepub fn most_similar_by_vector(
&self,
vector: &Array1<f64>,
top_n: usize,
exclude_words: &[&str],
) -> Result<Vec<(String, f64)>>
pub fn most_similar_by_vector( &self, vector: &Array1<f64>, top_n: usize, exclude_words: &[&str], ) -> Result<Vec<(String, f64)>>
Find most similar words to a given vector
Sourcepub fn analogy(
&self,
a: &str,
b: &str,
c: &str,
top_n: usize,
) -> Result<Vec<(String, f64)>>
pub fn analogy( &self, a: &str, b: &str, c: &str, top_n: usize, ) -> Result<Vec<(String, f64)>>
Compute word analogy: a is to b as c is to ?
Uses vector arithmetic: result = b - a + c, then finds most similar words. Works with OOV words since FastText can compute vectors for any word.
Sourcepub fn word_similarity(&self, word1: &str, word2: &str) -> Result<f64>
pub fn word_similarity(&self, word1: &str, word2: &str) -> Result<f64>
Compute cosine similarity between two words
Both words can be OOV.
Sourcepub fn save<P: AsRef<Path>>(&self, path: P) -> Result<()>
pub fn save<P: AsRef<Path>>(&self, path: P) -> Result<()>
Save the model to a file
Saves in a format that includes word vectors, n-gram info, and config. Uses a custom header format: Line 1: FASTTEXT <vocab_size> <vector_size> <min_n> <max_n> <bucket_size> Lines 2+: word vector_components…
Sourcepub fn vocabulary_size(&self) -> usize
pub fn vocabulary_size(&self) -> usize
Get the vocabulary size
Sourcepub fn vector_size(&self) -> usize
pub fn vector_size(&self) -> usize
Get the vector size
Sourcepub fn ngram_range(&self) -> (usize, usize)
pub fn ngram_range(&self) -> (usize, usize)
Get the n-gram configuration (min_n, max_n)
Sourcepub fn ngram_count(&self) -> usize
pub fn ngram_count(&self) -> usize
Get the number of unique n-grams discovered
Sourcepub fn can_represent(&self, word: &str) -> bool
pub fn can_represent(&self, word: &str) -> bool
Check if a word can be represented (either in vocab or has matching n-grams)
Sourcepub fn get_vocabulary_words(&self) -> Vec<String>
pub fn get_vocabulary_words(&self) -> Vec<String>
Get all words in the vocabulary
Trait Implementations§
Source§impl WordEmbedding for FastText
impl WordEmbedding for FastText
Source§fn find_similar(&self, word: &str, top_n: usize) -> Result<Vec<(String, f64)>>
fn find_similar(&self, word: &str, top_n: usize) -> Result<Vec<(String, f64)>>
Source§fn solve_analogy(
&self,
a: &str,
b: &str,
c: &str,
top_n: usize,
) -> Result<Vec<(String, f64)>>
fn solve_analogy( &self, a: &str, b: &str, c: &str, top_n: usize, ) -> Result<Vec<(String, f64)>>
Source§fn vocab_size(&self) -> usize
fn vocab_size(&self) -> usize
Auto Trait Implementations§
impl Freeze for FastText
impl !RefUnwindSafe for FastText
impl Send for FastText
impl Sync for FastText
impl Unpin for FastText
impl UnsafeUnpin for FastText
impl !UnwindSafe for FastText
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.