Expand description
§Laurus
Laurus is a Rust library for building search engines with support for lexical, vector, and hybrid search. It provides a flexible and efficient way to create powerful search applications.
§Features
§Lexical Search (BM25-based Inverted Index)
- BM25 scoring over an inverted index
- Multiple query types (term, phrase, boolean, fuzzy, range, prefix, wildcard, etc.)
- Field-level boosting
§Vector Search (HNSW, Flat, IVF Indexes)
- HNSW index for fast approximate nearest-neighbor lookup
- Flat index for exact brute-force search
- IVF index for inverted-file based approximate search
- Configurable distance metrics and quantization
§Hybrid Search with Configurable Fusion (RRF, WeightedSum)
- Combine lexical and vector results with a configurable fusion algorithm
- Reciprocal Rank Fusion (RRF) for rank-based merging
- Weighted Sum fusion with automatic min-max score normalization
- Unified query DSL that mixes lexical and vector clauses in a single string
§Embedding Integration (Local BERT/CLIP via candle, OpenAI API)
- Local text embeddings via candle (BERT models)
- OpenAI API embeddings
- Multimodal CLIP embeddings (text + image)
- Per-field embedder routing via
PerFieldEmbedder
§Write-Ahead Log (WAL) for Durability
- WAL-backed durability with crash recovery
- Automatic replay of uncommitted changes on engine startup
- Rollback on partial indexing failures to maintain consistency
§Pluggable Storage (In-memory, File-based with mmap)
- In-memory storage (
MemoryStorage) for testing and ephemeral indexes - File-based storage (
FileStorage) with mmap-backed reads - Prefixed storage for logical partitioning within a single backend
§Text Analysis Pipeline (Tokenizers, Filters, Char Filters, Synonyms)
- Pluggable analyzers: Standard, Simple, Keyword, per-field, and language-specific (English, Japanese)
- Tokenizers: Unicode word, whitespace, regex, n-gram, Lindera (Japanese morphological analysis)
- Token filters: lowercase, stop words, stemming (Porter, simple), synonym graph, boost, strip, limit
- Char filters: Unicode normalization, character mapping, pattern replace, Japanese iteration marks
- Custom analysis pipelines via
PipelineAnalyzer
§Column Storage for Filtering
- Column-oriented storage for efficient filtering and range queries on scalar fields
§Spelling Correction / “Did You Mean?” Suggestions
- Levenshtein-distance based spelling correction
- “Did you mean?” suggestions powered by index term frequencies
- Configurable auto-correction thresholds and maximum edit distances
- Query history learning for improved suggestion quality
Re-exports§
pub use analysis::analyzer::analyzer::Analyzer;pub use embedding::embedder::EmbedInput;pub use embedding::embedder::EmbedInputType;pub use embedding::embedder::Embedder;pub use embedding::per_field::PerFieldEmbedder;pub use embedding::precomputed::PrecomputedEmbedder;pub use lexical::core::field::BooleanOption;pub use lexical::core::field::BytesOption;pub use lexical::core::field::DateTimeOption;pub use lexical::core::field::FloatOption;pub use lexical::core::field::GeoOption;pub use lexical::core::field::IntegerOption;pub use lexical::core::field::TextOption;pub use lexical::search::searcher::LexicalSearchParams;pub use lexical::search::searcher::LexicalSearchQuery;pub use lexical::search::searcher::LexicalSearchRequest;pub use lexical::search::searcher::SortField;pub use lexical::search::searcher::SortOrder;pub use storage::Storage;pub use storage::StorageConfig;pub use storage::StorageFactory;pub use vector::core::distance::DistanceMetric;pub use vector::core::field::FlatOption;pub use vector::core::field::HnswOption;pub use vector::core::field::IvfOption;pub use vector::core::quantization::QuantizationMethod;pub use vector::store::request::QueryVector;pub use vector::store::request::VectorScoreMode;pub use vector::store::request::VectorSearchRequest;
Modules§
- analysis
- Text analysis module for Laurus.
- embedding
- Text and multimodal embedding support for Laurus vector search.
- lexical
- Lexical search implementation using inverted indexes.
- spelling
- Spelling correction and suggestion utilities for Laurus.
- storage
- Storage abstraction layer for Laurus.
- store
- Document storage module.
- vector
- Vector search implementation using approximate nearest neighbor algorithms.
Structs§
- Analyzer
Definition - A custom analyzer definition composed of a tokenizer and optional char/token filter chains.
- Deletion
Config - Configuration for deletion management.
- Document
- Unified Document structure.
- Engine
- Unified Engine that manages both Lexical and Vector indices.
- Engine
Builder - Builder for constructing an
Enginewith custom configuration. - Engine
Stats - Combined statistics from both the lexical and vector stores.
- Schema
- Schema for the unified engine.
- Search
Request - Unified search request that can contain lexical, vector, or both queries.
- Search
Request Builder - Fluent builder for constructing a
SearchRequest. - Search
Result - A single result from an
Enginesearch. - Unified
Query Parser - Unified query parser that composes lexical and vector parsers.
Enums§
- Char
Filter Config - Configuration for a char filter component.
- Data
Value - The unified value type for fields in a document.
- Embedder
Definition - A declarative embedder definition stored in the schema.
- Field
Option - Options for a single field in the unified schema.
- Fusion
Algorithm - Algorithm used to combine lexical and vector scores in hybrid search.
- Laurus
Error - The main error type for Laurus operations.
- Token
Filter Config - Configuration for a token filter component.
- Tokenizer
Config - Configuration for a tokenizer component.
Constants§
- VERSION
- The crate version string, populated at compile time from
Cargo.toml.
Type Aliases§
- Result
- Result type alias for operations that may fail with LaurusError.