lucivy-core

High-level Rust API for ld-lucivy — BM25 full-text search with cross-token fuzzy matching, substring search, regex, and highlights.

This is the recommended way to use Lucivy from Rust. It wraps ld-lucivy with schema management, query building, index handles, and snapshot export/import.

Also available as Python, Node.js, and WASM packages.

Install

[dependencies]
lucivy-core = "0.1"

Quick start

use lucivy_core::handle::LucivyHandle;
use lucivy_core::query::{SchemaConfig, FieldDef, QueryConfig};
use lucivy_core::directory::StdFsDirectory;
use std::sync::Arc;

// Define schema
let config = SchemaConfig {
    fields: vec![
        FieldDef { name: "title".into(), field_type: "text".into(), ..Default::default() },
        FieldDef { name: "body".into(), field_type: "text".into(), ..Default::default() },
        FieldDef { name: "tag".into(), field_type: "keyword".into(), ..Default::default() },
        FieldDef { name: "year".into(), field_type: "u64".into(),
                   indexed: Some(true), fast: Some(true), ..Default::default() },
    ],
    tokenizer: None,
    stemmer: None,
};

// Create index
let dir = StdFsDirectory::open("./my_index").unwrap();
let handle = LucivyHandle::create(dir, &config).unwrap();

// Add documents (via the IndexWriter)
let title = handle.field("title").unwrap();
let body = handle.field("body").unwrap();
{
    let mut writer = handle.writer.lock().unwrap();
    let mut doc = ld_lucivy::TantivyDocument::new();
    doc.add_text(title, "Rust Programming");
    doc.add_text(body, "Systems programming with memory safety");
    writer.add_document(doc).unwrap();
    writer.commit().unwrap();
}
handle.reader.reload().unwrap();

// Search
let query = QueryConfig {
    query_type: "contains".into(),
    field: Some("body".into()),
    value: Some("programming".into()),
    ..Default::default()
};
let built = lucivy_core::query::build_query(
    &query, &handle.schema, &handle.index,
    &handle.raw_field_pairs, &handle.ngram_field_pairs, None,
).unwrap();

let searcher = handle.reader.searcher();
let top_docs = searcher.search(
    &built,
    &ld_lucivy::collector::TopDocs::with_limit(10),
).unwrap();

Query types

All queries are built via QueryConfig structs, serializable to/from JSON.

contains — substring, fuzzy, regex (cross-token)

Searches stored text, not individual tokens. Handles multi-word phrases, substrings, typos, and regex across token boundaries.

// Substring — matches "programming", "programmer", etc.
let q = QueryConfig {
    query_type: "contains".into(),
    field: Some("body".into()),
    value: Some("program".into()),
    ..Default::default()
};

// Fuzzy (catches typos, distance=1 by default)
let q = QueryConfig {
    query_type: "contains".into(),
    field: Some("body".into()),
    value: Some("programing languag".into()),
    distance: Some(1),
    ..Default::default()
};

// Regex on stored text (cross-token)
let q = QueryConfig {
    query_type: "contains".into(),
    field: Some("body".into()),
    value: Some("program.*language".into()),
    regex: Some(true),
    ..Default::default()
};

contains_split — one word = one contains, OR'd together

let q = QueryConfig {
    query_type: "contains_split".into(),
    field: Some("body".into()),
    value: Some("rust safety".into()),
    ..Default::default()
};

boolean — combine queries with must / should / must_not

let q = QueryConfig {
    query_type: "boolean".into(),
    must: Some(vec![
        QueryConfig {
            query_type: "contains".into(),
            field: Some("body".into()),
            value: Some("rust".into()),
            ..Default::default()
        },
    ]),
    must_not: Some(vec![
        QueryConfig {
            query_type: "contains".into(),
            field: Some("body".into()),
            value: Some("javascript".into()),
            ..Default::default()
        },
    ]),
    ..Default::default()
};

keyword / range — for non-text fields

// Exact keyword match
let q = QueryConfig {
    query_type: "keyword".into(),
    field: Some("tag".into()),
    value: Some("rust".into()),
    ..Default::default()
};

// Via filters on any query
let q = QueryConfig {
    query_type: "contains".into(),
    field: Some("body".into()),
    value: Some("programming".into()),
    filters: Some(vec![
        lucivy_core::query::FilterClause {
            field: Some("year".into()),
            op: "gte".into(),
            value: Some(serde_json::json!(2023)),
            ..Default::default()
        },
    ]),
    ..Default::default()
};

Highlights

All query types support byte-offset highlights via HighlightSink.

use ld_lucivy::query::HighlightSink;
use std::sync::Arc;

let sink = Arc::new(HighlightSink::new());
let built = lucivy_core::query::build_query(
    &query, &handle.schema, &handle.index,
    &handle.raw_field_pairs, &handle.ngram_field_pairs,
    Some(sink.clone()),
).unwrap();

// After search, read highlights:
let highlights = sink.take(); // HashMap<String, Vec<(u32, u32)>>

Snapshots (export / import)

Portable .luce binary format — export an index, import it elsewhere.

use lucivy_core::snapshot;
use std::path::Path;

// Export to bytes
let data = snapshot::export_index(&handle, Path::new("./my_index")).unwrap();

// Import from bytes
let restored = snapshot::import_index(&data, Path::new("./restored_index")).unwrap();

How contains works

Every text field gets 3 sub-fields automatically:

Sub-field	Tokenizer	Purpose
`{name}`	stemmed or lowercase	BM25 scoring
`{name}._raw`	lowercase only	contains verification (precision)
`{name}._ngram`	character trigrams	contains candidate generation

The contains query uses trigram-accelerated substring search:

Candidate collection via trigram intersection on ._ngram
Verification on stored text (fuzzy or regex)
BM25 scoring

Lineage

Fork of tantivy v0.26.0 (via izihawa/tantivy).

License

MIT. See LICENSE.

lucivy-core 0.1.1