pub struct Index { /* private fields */ }Expand description
MinHash + LSH index. Insert documents, then query with a fresh string.
Implementations§
Source§impl Index
impl Index
Sourcepub fn insert(&mut self, id: impl Into<String>, text: &str) -> Result<()>
pub fn insert(&mut self, id: impl Into<String>, text: &str) -> Result<()>
Insert a document. Duplicate ids are allowed but discouraged; the index does not deduplicate them itself.
Sourcepub fn signature(&self, text: &str) -> Vec<u32>
pub fn signature(&self, text: &str) -> Vec<u32>
Compute the MinHash signature for a string. Pure function of text
and the index config; no insertion side effects.
Sourcepub fn jaccard(sig_a: &[u32], sig_b: &[u32]) -> f64
pub fn jaccard(sig_a: &[u32], sig_b: &[u32]) -> f64
Estimated Jaccard similarity between two signatures of the same length.
Sourcepub fn near_duplicates(&self, text: &str, min_similarity: f64) -> Vec<Hit>
pub fn near_duplicates(&self, text: &str, min_similarity: f64) -> Vec<Hit>
Return all indexed documents whose estimated Jaccard similarity to
text is >= min_similarity. Sorted by similarity descending.
Sourcepub fn save<P: AsRef<Path>>(&self, path: P) -> Result<()>
pub fn save<P: AsRef<Path>>(&self, path: P) -> Result<()>
Persist the index to a JSON file. Stores cfg, the hash family
(a, b), and the per-doc signatures. The runtime bands map and
band_hasher are reconstructed on load (band_hasher is keyed off
cfg.seed, so band hashes round-trip identically).
Auto Trait Implementations§
impl Freeze for Index
impl RefUnwindSafe for Index
impl Send for Index
impl Sync for Index
impl Unpin for Index
impl UnsafeUnpin for Index
impl UnwindSafe for Index
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more