Skip to main content

Crate laurus

Crate laurus 

Source
Expand description

§Laurus

Laurus is a Rust library for building search engines with support for lexical, vector, and hybrid search. It provides a flexible and efficient way to create powerful search applications.

§Features

§Lexical Search (BM25-based Inverted Index)

  • BM25 scoring over an inverted index
  • Multiple query types (term, phrase, boolean, fuzzy, range, prefix, wildcard, etc.)
  • Field-level boosting

§Vector Search (HNSW, Flat, IVF Indexes)

  • HNSW index for fast approximate nearest-neighbor lookup
  • Flat index for exact brute-force search
  • IVF index for inverted-file based approximate search
  • Configurable distance metrics and quantization

§Hybrid Search with Configurable Fusion (RRF, WeightedSum)

  • Combine lexical and vector results with a configurable fusion algorithm
  • Reciprocal Rank Fusion (RRF) for rank-based merging
  • Weighted Sum fusion with automatic min-max score normalization
  • Unified query DSL that mixes lexical and vector clauses in a single string

§Embedding Integration (Local BERT/CLIP via candle, OpenAI API)

  • Local text embeddings via candle (BERT models)
  • OpenAI API embeddings
  • Multimodal CLIP embeddings (text + image)
  • Per-field embedder routing via PerFieldEmbedder

§Write-Ahead Log (WAL) for Durability

  • WAL-backed durability with crash recovery
  • Automatic replay of uncommitted changes on engine startup
  • Rollback on partial indexing failures to maintain consistency

§Pluggable Storage (In-memory, File-based with mmap)

  • In-memory storage (MemoryStorage) for testing and ephemeral indexes
  • File-based storage (FileStorage) with mmap-backed reads
  • Prefixed storage for logical partitioning within a single backend

§Text Analysis Pipeline (Tokenizers, Filters, Char Filters, Synonyms)

  • Pluggable analyzers: Standard, Simple, Keyword, per-field, and language-specific (English, Japanese)
  • Tokenizers: Unicode word, whitespace, regex, n-gram, Lindera (Japanese morphological analysis)
  • Token filters: lowercase, stop words, stemming (Porter, simple), synonym graph, boost, strip, limit
  • Char filters: Unicode normalization, character mapping, pattern replace, Japanese iteration marks
  • Custom analysis pipelines via PipelineAnalyzer

§Column Storage for Filtering

  • Column-oriented storage for efficient filtering and range queries on scalar fields

§Spelling Correction / “Did You Mean?” Suggestions

  • Levenshtein-distance based spelling correction
  • “Did you mean?” suggestions powered by index term frequencies
  • Configurable auto-correction thresholds and maximum edit distances
  • Query history learning for improved suggestion quality

Re-exports§

pub use analysis::analyzer::analyzer::Analyzer;
pub use embedding::embedder::EmbedInput;
pub use embedding::embedder::EmbedInputType;
pub use embedding::embedder::Embedder;
pub use embedding::per_field::PerFieldEmbedder;
pub use embedding::precomputed::PrecomputedEmbedder;
pub use lexical::core::field::BooleanOption;
pub use lexical::core::field::BytesOption;
pub use lexical::core::field::DateTimeOption;
pub use lexical::core::field::FloatOption;
pub use lexical::core::field::GeoOption;
pub use lexical::core::field::IntegerOption;
pub use lexical::core::field::TextOption;
pub use lexical::search::searcher::LexicalSearchParams;
pub use lexical::search::searcher::LexicalSearchQuery;
pub use lexical::search::searcher::LexicalSearchRequest;
pub use lexical::search::searcher::SortField;
pub use lexical::search::searcher::SortOrder;
pub use storage::Storage;
pub use storage::StorageConfig;
pub use storage::StorageFactory;
pub use vector::core::distance::DistanceMetric;
pub use vector::core::field::FlatOption;
pub use vector::core::field::HnswOption;
pub use vector::core::field::IvfOption;
pub use vector::core::quantization::QuantizationMethod;
pub use vector::store::request::QueryVector;
pub use vector::store::request::VectorScoreMode;
pub use vector::store::request::VectorSearchRequest;

Modules§

analysis
Text analysis module for Laurus.
embedding
Text and multimodal embedding support for Laurus vector search.
lexical
Lexical search implementation using inverted indexes.
spelling
Spelling correction and suggestion utilities for Laurus.
storage
Storage abstraction layer for Laurus.
store
Document storage module.
vector
Vector search implementation using approximate nearest neighbor algorithms.

Structs§

AnalyzerDefinition
A custom analyzer definition composed of a tokenizer and optional char/token filter chains.
DeletionConfig
Configuration for deletion management.
Document
Unified Document structure.
Engine
Unified Engine that manages both Lexical and Vector indices.
EngineBuilder
Builder for constructing an Engine with custom configuration.
EngineStats
Combined statistics from both the lexical and vector stores.
Schema
Schema for the unified engine.
SearchRequest
Unified search request that can contain lexical, vector, or both queries.
SearchRequestBuilder
Fluent builder for constructing a SearchRequest.
SearchResult
A single result from an Engine search.
UnifiedQueryParser
Unified query parser that composes lexical and vector parsers.

Enums§

CharFilterConfig
Configuration for a char filter component.
DataValue
The unified value type for fields in a document.
EmbedderDefinition
A declarative embedder definition stored in the schema.
FieldOption
Options for a single field in the unified schema.
FusionAlgorithm
Algorithm used to combine lexical and vector scores in hybrid search.
LaurusError
The main error type for Laurus operations.
TokenFilterConfig
Configuration for a token filter component.
TokenizerConfig
Configuration for a tokenizer component.

Constants§

VERSION
The crate version string, populated at compile time from Cargo.toml.

Type Aliases§

Result
Result type alias for operations that may fail with LaurusError.