Crate laurus

Expand description

§Laurus

Laurus is a Rust library for building search engines with support for lexical, vector, and hybrid search. It provides a flexible and efficient way to create powerful search applications.

§Features

§Lexical Search (BM25-based Inverted Index)

BM25 scoring over an inverted index
Multiple query types (term, phrase, boolean, fuzzy, range, prefix, wildcard, etc.)
Field-level boosting

§Vector Search (HNSW, Flat, IVF Indexes)

HNSW index for fast approximate nearest-neighbor lookup
Flat index for exact brute-force search
IVF index for inverted-file based approximate search
Configurable distance metrics and quantization

§Hybrid Search with Configurable Fusion (RRF, WeightedSum)

Combine lexical and vector results with a configurable fusion algorithm
Reciprocal Rank Fusion (RRF) for rank-based merging
Weighted Sum fusion with automatic min-max score normalization
Unified query DSL that mixes lexical and vector clauses in a single string

§Embedding Integration (Local BERT/CLIP via candle, OpenAI API)

Local text embeddings via candle (BERT models)
OpenAI API embeddings
Multimodal CLIP embeddings (text + image)
Per-field embedder routing via PerFieldEmbedder

§Write-Ahead Log (WAL) for Durability

WAL-backed durability with crash recovery
Automatic replay of uncommitted changes on engine startup
Rollback on partial indexing failures to maintain consistency

§Pluggable Storage (In-memory, File-based with mmap)

In-memory storage (MemoryStorage) for testing and ephemeral indexes
File-based storage (FileStorage) with mmap-backed reads
Prefixed storage for logical partitioning within a single backend

§Text Analysis Pipeline (Tokenizers, Filters, Char Filters, Synonyms)

Pluggable analyzers: Standard, Simple, Keyword, per-field, and language-specific (English, Japanese)
Tokenizers: Unicode word, whitespace, regex, n-gram, Lindera (Japanese morphological analysis)
Token filters: lowercase, stop words, stemming (Porter, simple), synonym graph, boost, strip, limit
Char filters: Unicode normalization, character mapping, pattern replace, Japanese iteration marks
Custom analysis pipelines via PipelineAnalyzer

§Column Storage for Filtering

Column-oriented storage for efficient filtering and range queries on scalar fields

§Spelling Correction / “Did You Mean?” Suggestions

Levenshtein-distance based spelling correction
“Did you mean?” suggestions powered by index term frequencies
Configurable auto-correction thresholds and maximum edit distances
Query history learning for improved suggestion quality

Re-exports§

pub use analysis::analyzer::analyzer::Analyzer;
pub use embedding::embedder::EmbedInput;
pub use embedding::embedder::EmbedInputType;
pub use embedding::embedder::Embedder;
pub use embedding::per_field::PerFieldEmbedder;
pub use embedding::precomputed::PrecomputedEmbedder;
pub use engine::search::VectorSearchQuery;
pub use lexical::core::field::BooleanOption;
pub use lexical::core::field::BytesOption;
pub use lexical::core::field::DateTimeOption;
pub use lexical::core::field::FloatOption;
pub use lexical::core::field::Geo3dOption;
pub use lexical::core::field::GeoOption;
pub use lexical::core::field::IntegerOption;
pub use lexical::core::field::TextOption;
pub use lexical::search::searcher::LexicalSearchParams;
pub use lexical::search::searcher::LexicalSearchQuery;
pub use lexical::search::searcher::LexicalSearchRequest;
pub use lexical::search::searcher::SortField;
pub use lexical::search::searcher::SortOrder;
pub use storage::Storage;
pub use storage::StorageConfig;
pub use storage::StorageFactory;
pub use vector::core::distance::DistanceMetric;
pub use vector::core::field::FlatOption;
pub use vector::core::field::HnswOption;
pub use vector::core::field::IvfOption;
pub use vector::core::quantization::QuantizationMethod;
pub use vector::store::request::QueryPayload;
pub use vector::store::request::QueryVector;
pub use vector::store::request::VectorScoreMode;
pub use vector::store::request::VectorSearchParams;
pub use vector::store::request::VectorSearchRequest;

Modules§

analysis: Text analysis module for Laurus.
embedding: Text and multimodal embedding support for Laurus vector search.
lexical: Lexical search implementation using inverted indexes.
spelling: Spelling correction and suggestion utilities for Laurus.
storage: Storage abstraction layer for Laurus.
store: Document storage module.
util: Shared utility modules used across Laurus components.
vector: Vector search implementation using approximate nearest neighbor algorithms.

Structs§

AnalyzerDefinition: A custom analyzer definition composed of a tokenizer and optional char/token filter chains.
DeletionConfig: Configuration for deletion management.
Document: Unified Document structure.
Engine: Unified Engine that manages both Lexical and Vector indices.
EngineBuilder: Builder for constructing an Engine with custom configuration.
EngineStats: Combined statistics from both the lexical and vector stores.
GeoEcefPoint: A 3D point in Earth-Centered Earth-Fixed (ECEF) Cartesian coordinates.
GeoPoint: A geographical point on the Earth’s surface, in WGS84 latitude / longitude degrees.
LexicalSearchOptions: Parameters controlling lexical search behavior.
Schema: Schema for the unified engine.
SearchRequest: Unified search request combining query specification with pagination, options, and fusion settings.
SearchRequestBuilder: Fluent builder for constructing a SearchRequest.
SearchResult: A single result from an Engine search.
UnifiedQueryParser: Unified query parser that composes lexical and vector parsers.
VectorSearchOptions: Parameters controlling vector search behavior.

Enums§

AnalyzerSpec: Reference to an analyzer for a text field.
BuiltinAnalyzerSpec: Parameterized built-in analyzer presets.
CharFilterConfig: Configuration for a char filter component.
DataValue: The unified value type for fields in a document.
DynamicFieldPolicy: Policy for fields that are not declared in the schema.
EmbedderDefinition: A declarative embedder definition stored in the schema.
FieldOption: Options for a single field in the unified schema.
FusionAlgorithm: Algorithm used to combine lexical and vector scores in hybrid search.
HybridMode: Controls how lexical and vector results are combined in hybrid search.
InferredValue: Result of attempting to infer a DataValue and FieldOption from a raw JSON value.
LaurusError: The main error type for Laurus operations.
SearchQuery: Unified search query specification.
TokenFilterConfig: Configuration for a token filter component.
TokenizerConfig: Configuration for a tokenizer component.

Constants§

VERSION: The crate version string, populated at compile time from Cargo.toml.

Functions§

infer_from_json: Infer a DataValue and FieldOption from a JSON value.
infer_option_from_data_value: Infer a FieldOption from an existing DataValue.

Type Aliases§

Result: Result type alias for operations that may fail with LaurusError.

Crate laurus

Crate laurus Copy item path

§Laurus

§Features

§Lexical Search (BM25-based Inverted Index)

§Vector Search (HNSW, Flat, IVF Indexes)

§Hybrid Search with Configurable Fusion (RRF, WeightedSum)

§Embedding Integration (Local BERT/CLIP via candle, OpenAI API)

§Write-Ahead Log (WAL) for Durability

§Pluggable Storage (In-memory, File-based with mmap)

§Text Analysis Pipeline (Tokenizers, Filters, Char Filters, Synonyms)

§Column Storage for Filtering

§Spelling Correction / “Did You Mean?” Suggestions

Re-exports§

Modules§

Structs§

Enums§

Constants§

Functions§

Type Aliases§

Crate laurus