lucivy-core
High-level Rust API for ld-lucivy — BM25 full-text search with cross-token fuzzy matching, substring search, regex, and highlights.
This is the recommended way to use Lucivy from Rust. It wraps ld-lucivy with schema management, query building, index handles, and snapshot export/import.
Also available as Python, Node.js, and WASM packages.
Install
[]
= "0.1"
Quick start
use LucivyHandle;
use ;
use StdFsDirectory;
use Arc;
// Define schema
let config = SchemaConfig ;
// Create index
let dir = open.unwrap;
let handle = create.unwrap;
// Add documents (via the IndexWriter)
let title = handle.field.unwrap;
let body = handle.field.unwrap;
handle.reader.reload.unwrap;
// Search
let query = QueryConfig ;
let built = build_query.unwrap;
let searcher = handle.reader.searcher;
let top_docs = searcher.search.unwrap;
Query types
All queries are built via QueryConfig structs, serializable to/from JSON.
contains — substring, fuzzy, regex (cross-token)
Searches stored text, not individual tokens. Handles multi-word phrases, substrings, typos, and regex across token boundaries.
// Substring — matches "programming", "programmer", etc.
let q = QueryConfig ;
// Fuzzy (catches typos, distance=1 by default)
let q = QueryConfig ;
// Regex on stored text (cross-token)
let q = QueryConfig ;
contains_split — one word = one contains, OR'd together
let q = QueryConfig ;
boolean — combine queries with must / should / must_not
let q = QueryConfig ;
keyword / range — for non-text fields
// Exact keyword match
let q = QueryConfig ;
// Via filters on any query
let q = QueryConfig ;
Highlights
All query types support byte-offset highlights via HighlightSink.
use HighlightSink;
use Arc;
let sink = new;
let built = build_query.unwrap;
// After search, read highlights:
let highlights = sink.take; // HashMap<String, Vec<(u32, u32)>>
Snapshots (export / import)
Portable .luce binary format — export an index, import it elsewhere.
use snapshot;
use Path;
// Export to bytes
let data = export_index.unwrap;
// Import from bytes
let restored = import_index.unwrap;
How contains works
Every text field gets 3 sub-fields automatically:
| Sub-field | Tokenizer | Purpose |
|---|---|---|
{name} |
stemmed or lowercase | BM25 scoring |
{name}._raw |
lowercase only | contains verification (precision) |
{name}._ngram |
character trigrams | contains candidate generation |
The contains query uses trigram-accelerated substring search:
- Candidate collection via trigram intersection on
._ngram - Verification on stored text (fuzzy or regex)
- BM25 scoring
Lineage
Fork of tantivy v0.26.0 (via izihawa/tantivy).
License
MIT. See LICENSE.