pub struct KeywordExtractionConfig<'a> {
pub sentence_embeddings_config: SentenceEmbeddingsConfig,
pub tokenizer_stopwords: Option<HashSet<&'a str>>,
pub tokenizer_pattern: Option<Regex>,
pub tokenizer_forbidden_ngram_chars: Option<&'a [char]>,
pub scorer_type: KeywordScorerType,
pub ngram_range: (usize, usize),
pub num_keywords: usize,
pub diversity: Option<f64>,
pub max_sum_candidates: Option<usize>,
}
Expand description
§Configuration for Keyword extraction
Fields§
§sentence_embeddings_config: SentenceEmbeddingsConfig
SentenceEmbeddingsConfig
defining the sentence embeddings model to use
tokenizer_stopwords: Option<HashSet<&'a str>>
Optional list of tokenizer stopwords to exclude from the keywords candidate list. Default to a list of English stopwords.
tokenizer_pattern: Option<Regex>
Optional tokenization regex pattern. Defaults to sequence of word characters.
tokenizer_forbidden_ngram_chars: Option<&'a [char]>
Optional list of characters that should not be included in ngrams (useful to filter ngrams spanning over punctuation marks).
scorer_type: KeywordScorerType
KeywordScorerType
used to rank keywords.
ngram_range: (usize, usize)
N-gram range (inclusive) for keywords. (1, 2) would consider all 1 and 2 word gram for keyword candidates.
num_keywords: usize
Number of keywords to return
diversity: Option<f64>
Optional diversity parameter used for the MaximalMarginRelevance
ranker, defaults to 0.5.
A high diversity (closer to 1.0) will give more importance to getting varied keywords, at the
cost of less relevance to the original document.
max_sum_candidates: Option<usize>
Optional number of candidate sets used for MaxSum
ranker. Higher values are more likely to
identify a global optimum for the ranker criterion, but are more likely to include sets that are less relevant to the
input document. Larger values also have a higher computational and memory cost (N2 scale)
Trait Implementations§
Auto Trait Implementations§
impl<'a> Freeze for KeywordExtractionConfig<'a>
impl<'a> !RefUnwindSafe for KeywordExtractionConfig<'a>
impl<'a> Send for KeywordExtractionConfig<'a>
impl<'a> Sync for KeywordExtractionConfig<'a>
impl<'a> Unpin for KeywordExtractionConfig<'a>
impl<'a> !UnwindSafe for KeywordExtractionConfig<'a>
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more