Skip to main content

TextIndex

Struct TextIndex 

Source
pub struct TextIndex { /* private fields */ }
Expand description

Trigram-based text index supporting incremental insert, remove, and exact-substring search.

Implementations§

Source§

impl TextIndex

Source

pub fn new() -> Self

Construct an empty index.

Source

pub fn doc_count(&self) -> usize

Number of documents currently in the index.

Source

pub fn postings(&self) -> &Postings

Borrow the inverted postings index.

Source

pub fn docs(&self) -> &BTreeMap<u32, IndexedDoc>

Borrow the document store.

Source

pub fn insert(&mut self, text: Vec<u8>) -> u32

Insert text and return the assigned doc id.

The doc id is assigned monotonically; doc ids are not recycled, so a removed id is not handed out again.

Source

pub fn remove(&mut self, doc_id: u32) -> Option<Vec<u8>>

Remove the document at doc_id and return its raw bytes, if any.

All trigram entries for the doc are pulled from the postings index. A trigram whose postings list becomes empty after removal is garbage collected.

Source

pub fn search_substring(&self, query: &[u8]) -> Vec<u32>

Search for documents whose text contains query as a contiguous byte substring.

Results are returned in insertion order. A document that has been removed is not returned even if its postings entries were missed by a buggy remove (we always re-verify against the doc store).

Queries shorter than MIN_TRIGRAM_QUERY_LEN cannot be resolved through the trigram index and fall back to a full scan.

Source

pub fn search_regex(&self, pattern: &str) -> Result<Vec<u32>, RegexError>

Search for documents whose text matches pattern as a regular expression.

The query path is the same four-tier filter funnel as Self::search_substring, plus a Phase-2 prefix extraction step:

  1. Parse pattern into the internal AST and extract the trigrams that any matching string MUST contain (see crate::prefix_extract).
  2. Intersect those trigrams’ postings lists into a candidate doc-id set. If the AST cannot be lowered (named capture group, etc.) or yields no required trigrams, fall back to scanning every doc.
  3. Per-doc bloom filter recheck (skipped on full scan).
  4. Compile the pattern with regex::bytes::Regex and re-run it against each candidate’s stored bytes.

Results are returned in insertion order.

§Errors

Returns RegexError::Parse if the pattern is syntactically invalid or uses a regex feature that the underlying regex crate does not support (lookarounds, backreferences, …). A pattern that parses cleanly but trips the prefix extractor’s unsupported-feature path (named capture groups) does NOT surface as an error: the search still runs, just via the slower full-scan + recheck path.

Source

pub fn search_regex_approx( &self, pattern: &str, max_errors: u16, ) -> Result<Vec<u32>, TreError>

Search for documents that match pattern as an approximate POSIX extended regular expression with up to max_errors edit operations.

This is the Phase 3 entry point for the TRE-backed recheck. The current implementation does a full scan over the document store: every doc is fed to a single compiled TreCompiledPattern. Phase 2 will add a regex prefix extractor that lets us restrict the scan to a trigram-postings-derived candidate set; the signature here is forward-compatible with that change.

Results are returned in ascending document-id order, which equals insertion order because doc ids are monotonic.

Trait Implementations§

Source§

impl Clone for TextIndex

Source§

fn clone(&self) -> TextIndex

Returns a duplicate of the value. Read more
1.0.0 (const: unstable) · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for TextIndex

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for TextIndex

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for TextIndex

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl Serialize for TextIndex

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,