pub struct CompressionSummarizer { /* private fields */ }Expand description
Sentence compression by dropping low-importance tokens.
The importance of each token is computed via a TF-IDF-inspired heuristic using term frequency within the sentence and inverse document frequency estimated from a small built-in stop-word list.
§Example
use scirs2_text::abstractive_summary::CompressionSummarizer;
let cs = CompressionSummarizer::new();
let compressed = cs.compress_sentence("The very quick brown fox jumped lazily", 0.6);
assert!(!compressed.is_empty());Implementations§
Source§impl CompressionSummarizer
impl CompressionSummarizer
Sourcepub fn with_stop_words(stop_words: HashSet<String>) -> Self
pub fn with_stop_words(stop_words: HashSet<String>) -> Self
Create a CompressionSummarizer with a custom stop-word list.
Sourcepub fn importance_score(&self, token: &str, sentence_tokens: &[String]) -> f64
pub fn importance_score(&self, token: &str, sentence_tokens: &[String]) -> f64
Compute the importance score of a single token given its sentence context.
Score components:
- Term frequency within the sentence.
- Stop-word penalty (×0.1 if the token is a stop word).
- Length bonus: longer tokens receive a slight boost.
- Capitalisation bonus: capitalised mid-sentence tokens receive a boost (heuristic for proper nouns).
Sourcepub fn compress_sentence(&self, sentence: &str, ratio: f64) -> String
pub fn compress_sentence(&self, sentence: &str, ratio: f64) -> String
Compress sentence by retaining only the fraction ratio of tokens
with the highest importance scores.
ratio is clamped to (0.0, 1.0]. A ratio of 1.0 keeps all tokens.
Tokens are retained in their original order.
Returns an empty string if the sentence has no words.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for CompressionSummarizer
impl RefUnwindSafe for CompressionSummarizer
impl Send for CompressionSummarizer
impl Sync for CompressionSummarizer
impl Unpin for CompressionSummarizer
impl UnsafeUnpin for CompressionSummarizer
impl UnwindSafe for CompressionSummarizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
The inverse inclusion map: attempts to construct
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
Checks if
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
Use with care! Same as
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
The inclusion map: converts
self to the equivalent element of its superset.