pub struct DuplicateDetector { /* private fields */ }Expand description
Detects potential duplicates in a dataset.
Implementations§
Source§impl DuplicateDetector
impl DuplicateDetector
Sourcepub fn new(similarity_threshold: f64, comparison_fields: Vec<String>) -> Self
pub fn new(similarity_threshold: f64, comparison_fields: Vec<String>) -> Self
Creates a new duplicate detector.
Sourcepub fn string_similarity(&self, a: &str, b: &str) -> f64
pub fn string_similarity(&self, a: &str, b: &str) -> f64
Calculates similarity between two strings (Jaccard similarity).
Sourcepub fn are_duplicates<T: Duplicatable>(&self, a: &T, b: &T) -> bool
pub fn are_duplicates<T: Duplicatable>(&self, a: &T, b: &T) -> bool
Checks if two records are potential duplicates.
Sourcepub fn find_duplicates<T: Duplicatable>(
&self,
records: &[T],
) -> Vec<(usize, usize, f64)>
pub fn find_duplicates<T: Duplicatable>( &self, records: &[T], ) -> Vec<(usize, usize, f64)>
Finds all duplicate pairs in a collection.
Auto Trait Implementations§
impl Freeze for DuplicateDetector
impl RefUnwindSafe for DuplicateDetector
impl Send for DuplicateDetector
impl Sync for DuplicateDetector
impl Unpin for DuplicateDetector
impl UnwindSafe for DuplicateDetector
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more