Struct TextIndex

Source

pub struct TextIndex { /* private fields */ }

Expand description

Trigram-based text index supporting incremental insert, remove, and exact-substring search.

Implementations§

Source §

impl TextIndex

Source

pub fn new() -> Self

Construct an empty index.

Source

pub fn doc_count(&self) -> usize

Number of documents currently in the index.

Source

pub fn postings(&self) -> &Postings

Borrow the inverted postings index.

Source

pub fn docs(&self) -> &BTreeMap<u32, IndexedDoc>

Borrow the document store.

Source

pub fn insert(&mut self, text: Vec<u8>) -> u32

Insert text and return the assigned doc id.

The doc id is assigned monotonically; doc ids are not recycled, so a removed id is not handed out again.

Source

pub fn remove(&mut self, doc_id: u32) -> Option<Vec<u8>>

Remove the document at doc_id and return its raw bytes, if any.

All trigram entries for the doc are pulled from the postings index. A trigram whose postings list becomes empty after removal is garbage collected.

Source

pub fn search_substring(&self, query: &[u8]) -> Vec<u32>

Search for documents whose text contains query as a contiguous byte substring.

Results are returned in insertion order. A document that has been removed is not returned even if its postings entries were missed by a buggy remove (we always re-verify against the doc store).

Queries shorter than MIN_TRIGRAM_QUERY_LEN cannot be resolved through the trigram index and fall back to a full scan.

Source

pub fn search_regex(&self, pattern: &str) -> Result<Vec<u32>, RegexError>

Search for documents whose text matches pattern as a regular expression.

The query path is the same four-tier filter funnel as Self::search_substring, plus a Phase-2 prefix extraction step:

Parse pattern into the internal AST and extract the trigrams that any matching string MUST contain (see crate::prefix_extract).
Intersect those trigrams’ postings lists into a candidate doc-id set. If the AST cannot be lowered (named capture group, etc.) or yields no required trigrams, fall back to scanning every doc.
Per-doc bloom filter recheck (skipped on full scan).
If the AST starts with ^literal, prune candidates whose first bytes do not equal the literal prefix.
Compile the pattern with regex::bytes::Regex and re-run it against each candidate’s stored bytes.

Results are returned in insertion order.

§Errors

Returns RegexError::Parse if the pattern is syntactically invalid or uses a regex feature that the underlying regex crate does not support (lookarounds, backreferences, …). A pattern that parses cleanly but trips the prefix extractor’s unsupported-feature path (named capture groups) does NOT surface as an error: the search still runs, just via the slower full-scan + recheck path.

Source

pub fn search_regex_approx( &self, pattern: &str, max_errors: u16, ) -> Result<Vec<u32>, TreError>

Search for documents that match pattern as an approximate POSIX extended regular expression with up to max_errors edit operations.

The path mirrors Self::search_regex but the tier-4 recheck delegates to TreCompiledPattern instead of the std regex matcher because TRE is the only engine in the workspace that implements approximate match semantics. The pre-recheck filter funnel uses the pigeonhole bound surviving_trigrams >= T - 3k (see ApproxFilter) to stay sound under the edit budget.

For a ^literal... pattern the anchor fast-path rejects every candidate whose first bytes are too far (in Hamming distance) from the literal prefix. For max_errors == 0 this is a byte-equality check; for max_errors >= 1 it is a Hamming-distance check against the prefix length, which is sound for the substitution-only case TRE optimises and conservative (always returns true for prefixes whose Hamming distance is within max_errors) for the insert/delete cases.

When the surviving candidate set after filtering is large (>= PARALLEL_RECHECK_THRESHOLD), the per-doc TRE recheck is dispatched to a Rayon parallel iterator to fan the cost across CPU cores. TRE’s compiled regex_t is !Send, so each parallel worker compiles its own copy from the original pattern bytes.

Results are returned in ascending document-id order.

§Errors

Returns TreError::Compile if the pattern fails to compile under the given options.

Trait Implementations§

Source §

impl Clone for TextIndex

Source §

fn clone(&self) -> TextIndex

Returns a duplicate of the value. Read more

1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for TextIndex

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Default for TextIndex

Source §

fn default() -> Self

Returns the “default value” for a type. Read more

Source §

impl<'de> Deserialize<'de> for TextIndex

Source §

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more

Source §

impl Serialize for TextIndex

Source §

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

§

impl UnwindSafe for TextIndex

Blanket Implementations§

Source §

impl<T> Any for T
where T: 'static + ?Sized,

Source §

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more

Source §

impl<T> Borrow<T> for T
where T: ?Sized,

Source §

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more

Source §

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source §

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more

Source §

impl<T> CloneToUninit for T
where T: Clone,

Source §

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)

Performs copy-assignment from self to dest. Read more

Source §

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

Source §

impl<T> From<T> for T

Source §

fn from(t: T) -> T

Returns the argument unchanged.

Source §

impl<T, U> Into for T
where U: From<T>,

Source §

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source §

impl<T> IntoEither for T

Source §

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more

Source §

impl<T> Pointable for T

Source §

const ALIGN: usize

The alignment of pointer.

Source §

type Init = T

The type for initializers.

Source §

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more

Source §

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more

Source §

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more

Source §

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more

Source §

impl<T> ToOwned for T
where T: Clone,

Source §

type Owned = T

The resulting type after obtaining ownership.

Source §

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more

Source §

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more

Source §

impl<T, U> TryFrom for T
where U: Into<T>,

Source §

type Error = Infallible

The type returned in the event of a conversion error.

Source §

fn try_from(value: U) -> Result<T, <T as TryFrom>::Error>

Performs the conversion.

Source §

impl<T, U> TryInto for T
where U: TryFrom<T>,

Source §

type Error = >::Error

The type returned in the event of a conversion error.

Source §

fn try_into(self) -> Result<U, >::Error>

Performs the conversion.

Struct TextIndex Copy item path

Implementations§

impl TextIndex

pub fn new() -> Self

pub fn doc_count(&self) -> usize

pub fn postings(&self) -> &Postings

pub fn docs(&self) -> &BTreeMap<u32, IndexedDoc>

pub fn insert(&mut self, text: Vec<u8>) -> u32

pub fn remove(&mut self, doc_id: u32) -> Option<Vec<u8>>

pub fn search_substring(&self, query: &[u8]) -> Vec<u32>

pub fn search_regex(&self, pattern: &str) -> Result<Vec<u32>, RegexError>

§Errors

pub fn search_regex_approx( &self, pattern: &str, max_errors: u16, ) -> Result<Vec<u32>, TreError>

§Errors

Trait Implementations§

impl Clone for TextIndex

fn clone(&self) -> TextIndex

fn clone_from(&mut self, source: &Self)

impl Debug for TextIndex

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for TextIndex

fn default() -> Self

impl<'de> Deserialize<'de> for TextIndex

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where __D: Deserializer<'de>,

impl Serialize for TextIndex

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>where __S: Serializer,

Auto Trait Implementations§

impl Freeze for TextIndex

impl RefUnwindSafe for TextIndex

impl Send for TextIndex

impl Sync for TextIndex

impl Unpin for TextIndex

impl UnsafeUnpin for TextIndex

impl UnwindSafe for TextIndex

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> DeserializeOwned for Twhere T: for<'de> Deserialize<'de>,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct TextIndex

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,