Skip to main content

Index

Struct Index 

Source
pub struct Index { /* private fields */ }
Expand description

MinHash + LSH index. Insert documents, then query with a fresh string.

Implementations§

Source§

impl Index

Source

pub fn new(cfg: Config) -> Result<Self>

Build an empty index.

Source

pub fn config(&self) -> &Config

The active config (read-only).

Source

pub fn len(&self) -> usize

Number of indexed documents.

Source

pub fn is_empty(&self) -> bool

True iff no documents are indexed.

Source

pub fn insert(&mut self, id: impl Into<String>, text: &str) -> Result<()>

Insert a document. Duplicate ids are allowed but discouraged; the index does not deduplicate them itself.

Source

pub fn signature(&self, text: &str) -> Vec<u32>

Compute the MinHash signature for a string. Pure function of text and the index config; no insertion side effects.

Source

pub fn jaccard(sig_a: &[u32], sig_b: &[u32]) -> f64

Estimated Jaccard similarity between two signatures of the same length.

Source

pub fn near_duplicates(&self, text: &str, min_similarity: f64) -> Vec<Hit>

Return all indexed documents whose estimated Jaccard similarity to text is >= min_similarity. Sorted by similarity descending.

Source

pub fn save<P: AsRef<Path>>(&self, path: P) -> Result<()>

Persist the index to a JSON file. Stores cfg, the hash family (a, b), and the per-doc signatures. The runtime bands map and band_hasher are reconstructed on load (band_hasher is keyed off cfg.seed, so band hashes round-trip identically).

Source

pub fn load<P: AsRef<Path>>(path: P) -> Result<Self>

Reverse of Index::save. Validates the loaded cfg (rejects configs that wouldn’t construct cleanly) before rebuilding the in-memory bands map.

Auto Trait Implementations§

§

impl Freeze for Index

§

impl RefUnwindSafe for Index

§

impl Send for Index

§

impl Sync for Index

§

impl Unpin for Index

§

impl UnsafeUnpin for Index

§

impl UnwindSafe for Index

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V