pub struct Deduplicator { /* private fields */ }Expand description
LRU-based content deduplicator.
Tracks seen text fragments to detect duplicates during processing.
§Example
use html_cleaning::dedup::Deduplicator;
let mut dedup = Deduplicator::new(100);
assert!(!dedup.is_duplicate("first occurrence"));
assert!(!dedup.is_duplicate("first occurrence")); // seen once
assert!(!dedup.is_duplicate("first occurrence")); // seen twice
assert!(dedup.is_duplicate("first occurrence")); // now duplicate (>2)Implementations§
Source§impl Deduplicator
impl Deduplicator
Sourcepub fn new(capacity: usize) -> Self
pub fn new(capacity: usize) -> Self
Create with specified capacity.
Uses default threshold of 2 (text seen more than 2 times is duplicate).
§Arguments
capacity- Maximum number of entries to track
Sourcepub fn with_threshold(capacity: usize, threshold: i32) -> Self
pub fn with_threshold(capacity: usize, threshold: i32) -> Self
Create with capacity and custom duplicate threshold.
§Arguments
capacity- Maximum entriesthreshold- Number of times text can appear before considered duplicate
Sourcepub fn is_duplicate(&mut self, text: &str) -> bool
pub fn is_duplicate(&mut self, text: &str) -> bool
Check if text is duplicate, adding to cache.
Returns true if text has been seen more than threshold times.
Auto Trait Implementations§
impl Freeze for Deduplicator
impl RefUnwindSafe for Deduplicator
impl Send for Deduplicator
impl Sync for Deduplicator
impl Unpin for Deduplicator
impl UnsafeUnpin for Deduplicator
impl UnwindSafe for Deduplicator
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more