Skip to main content

Crate frankensearch

Crate frankensearch 

Source
Expand description

§frankensearch

Two-tier hybrid search for Rust: sub-millisecond initial results, quality-refined rankings in ~150ms.

frankensearch combines lexical (Tantivy BM25) and semantic (vector cosine similarity) search via Reciprocal Rank Fusion, with a two-tier progressive embedding model that delivers results in two phases:

  1. Phase 1 (Initial): Fast embedder (potion-128M, 256d, ~0.57ms) produces results immediately via brute-force vector search + optional BM25 fusion.
  2. Phase 2 (Refined): Quality embedder (MiniLM-L6-v2, 384d, ~128ms) re-scores the top candidates for higher relevance.

Consumers receive results progressively via SearchPhase callbacks, so UIs can display fast results while quality refinement runs in the background.

§Quick Start

Build an index and search it (requires only the default hash feature):

use std::sync::Arc;
use frankensearch::prelude::*;
use frankensearch::{EmbedderStack, HashEmbedder, IndexBuilder, TwoTierIndex};
use frankensearch_core::traits::Embedder;

asupersync::test_utils::run_test_with_cx(|cx| async move {
    // Build an index
    let fast = Arc::new(HashEmbedder::default_256()) as Arc<dyn Embedder>;
    let quality = Arc::new(HashEmbedder::default_384()) as Arc<dyn Embedder>;
    let stack = EmbedderStack::from_parts(fast, Some(quality));

    let stats = IndexBuilder::new("./my_index")
        .with_embedder_stack(stack)
        .add_document("doc-1", "Rust ownership and borrowing")
        .add_document("doc-2", "Python garbage collection")
        .build(&cx)
        .await
        .expect("build index");

    // Search
    let fast = Arc::new(HashEmbedder::default_256()) as Arc<dyn Embedder>;
    let index = Arc::new(TwoTierIndex::open("./my_index", TwoTierConfig::default()).unwrap());
    let searcher = TwoTierSearcher::new(index, fast, TwoTierConfig::default());
    let (results, metrics) = searcher
        .search_collect(&cx, "memory management", 10)
        .await
        .expect("search");

    for result in &results {
        println!("{}: {:.4}", result.doc_id, result.score);
    }
});

§Architecture

 Query ─┬─► Fast Embed (256d) ─► Vector Search ─┐
        │                                         ├─► RRF Fusion ─► Phase 1 Results
        └─► Tantivy BM25 (optional) ─────────────┘
                                                       │
                                             Quality Embed (384d)
                                                       │
                                                  Score Blend
                                                       │
                                                 Phase 2 Results

§Crate Layout

CratePurpose
frankensearch-coreTypes, traits, errors, config
frankensearch-embedEmbedder implementations (hash, model2vec, fastembed)
frankensearch-indexFSVI vector index format, brute-force + HNSW search
frankensearch-fusionRRF fusion, blending, TwoTierSearcher orchestration
frankensearch-lexicalTantivy BM25 backend (feature-gated)
frankensearch-rerankFlashRank cross-encoder (feature-gated)

§Key Types

  • IndexBuilder — Build a search index from documents
  • TwoTierSearcher — Progressive two-phase search orchestrator
  • TwoTierConfig — Search configuration (blend factor, budgets, fast-only mode)
  • TwoTierMetrics — Per-search timing and diagnostic metrics
  • SearchPhase — Progressive result delivery (Initial / Refined / RefinementFailed)
  • EmbedderStack — Fast + optional quality embedder pair
  • VectorIndex — Low-level FSVI vector index reader

§Performance

Measured on a single core (no GPU), 10K document corpus:

OperationEmbedderLatency
Hash embed (256d)FNV-1a~11 μs
Fast embed (256d)potion-128M~0.57 ms
Quality embed (384d)MiniLM-L6-v2~128 ms
Vector search (10K, top-10)brute-force~2 ms
RRF fusion (500+500)-~1 ms
Full pipeline (hash, 10K)hash only~3 ms

§Feature Flags

FeatureDescription
hashFNV-1a hash embedder (default, zero dependencies)
model2vecpotion-128M static embedder (fast tier, ~0.57ms)
fastembedMiniLM-L6-v2 ONNX embedder (quality tier, ~128ms)
lexicalTantivy BM25 full-text search
rerankFlashRank cross-encoder reranking
annHNSW approximate nearest-neighbor index
downloadModel auto-download from HuggingFace via asupersync
storageFrankenSQLite document metadata + embedding queue
durabilityRaptorQ self-healing for persistent index artifacts
fts5Enables FrankenSQLite FTS5 lexical backend wiring
semantichash + model2vec + fastembed
hybridsemantic + lexical
persistenthybrid + storage
durablepersistent + durability
fulldurable + rerank + ann + download
full-fts5full + fts5
  • Development/testing: default (hash only, no downloads)
  • Production semantic: semantic + download
  • Persistent hybrid search: persistent
  • Maximum durability: durable or full

§Async Runtime

frankensearch uses asupersync exclusively — not tokio. All async methods take &Cx (capability context) as their first parameter. The Cx is provided by the consumer’s asupersync runtime; frankensearch never creates its own runtime.

Re-exports§

pub use frankensearch_core as core;
pub use frankensearch_embed as embed;
pub use frankensearch_fusion as fusion;
pub use frankensearch_index as index;

Modules§

prelude
Convenience re-exports for common usage.

Structs§

BootstrapCi
Confidence interval estimated via bootstrap resampling.
BootstrapComparison
Comparison of two paired score distributions via bootstrap.
Cx
Capability context for structured concurrency (from asupersync).
DaemonFallbackEmbedder
Embedder wrapper that uses the daemon when available and falls back to a local embedder.
DaemonFallbackReranker
Reranker wrapper that uses the daemon when available and falls back to a local reranker.
DaemonRetryConfig
Retry/backoff configuration for daemon requests.
DefaultCanonicalizer
Default canonicalization pipeline.
DimReduceEmbedder
MRL dimension reduction wrapper.
DocumentFingerprint
Content-aware fingerprint for deciding whether document embeddings should be refreshed.
EmbedderRegistry
Runtime registry wrapper with configured model-data root.
EmbedderStack
Resolved fast/quality embedder stack for progressive search.
EmbeddingMetrics
Structured telemetry for an embedding operation.
FederatedConfig
Configuration for federated search behavior.
FederatedHit
A fused hit returned by federated search.
FederatedSearcher
Multi-index search orchestrator with scatter-gather fusion.
FusedHit
A hit from hybrid fusion (lexical + semantic combined via RRF).
HashEmbedder
Zero-dependency hash-based embedder.
InMemoryTwoTierIndex
In-memory two-tier index wrapping fast and optional quality InMemoryVectorIndex.
InMemoryVectorIndex
Fully-resident in-memory vector index with f16 quantization.
IndexBuildStats
Statistics from a completed index build.
IndexBuilder
Fluent builder for creating frankensearch indexes.
IndexMetrics
Structured telemetry for index update operations.
IndexProgress
Progress update during index building.
IndexableDocument
A document to be indexed for search.
ModelInfo
Static metadata describing an embedder implementation.
NoOpMetricsExporter
No-op exporter used when no telemetry sink is attached.
NoopDaemonClient
No-op daemon client used when daemon config is missing.
PhaseMetrics
Diagnostic metrics for a search phase.
QualityComparison
Multi-metric quality comparison report.
QualityMetricComparison
Comparison result for a single quality metric.
QualityMetricSamples
Per-metric paired score samples for quality comparison.
RankChanges
Tracks how rankings changed between initial and refined phases.
RegisteredEmbedder
Static embedder metadata entry.
RerankDocument
A document for reranking: pairs a document ID with its text content.
RerankScore
A reranking score for a single document.
RrfConfig
RRF fusion parameters.
ScoredResult
The final scored search result delivered to consumers.
SearchMetrics
Structured telemetry for a completed search request.
SyncEmbedderAdapter
Adapts a SyncEmbed implementor into a full async Embedder.
SyncRerankerAdapter
Adapts a SyncRerank implementor into a full async Reranker.
SyncSearchIterator
Iterator over progressive phases produced by SyncTwoTierSearcher.
SyncTwoTierSearcher
Progressive synchronous searcher backed by InMemoryTwoTierIndex.
TwoTierConfig
Configuration for the two-tier progressive search pipeline.
TwoTierIndex
Dual-index container used by progressive search orchestration.
TwoTierIndexBuilder
Builder for writing fast and optional quality FSVI indices.
TwoTierMetrics
Diagnostics from a two-tier search execution.
TwoTierSearcher
Progressive two-tier search orchestrator.
VectorHit
A raw hit from vector similarity search.
VectorIndex
VectorIndexWriter

Enums§

DaemonError
Daemon request failure details.
FederatedFusion
Fusion methods supported by FederatedSearcher.
HashAlgorithm
Hash algorithm selection for the HashEmbedder.
ModelCategory
Classification of an embedding model by its speed/quality tradeoff.
ModelTier
Tier assignment in the progressive two-tier pipeline.
QualityMetric
Supported metric kinds for multi-metric quality comparisons.
QueryClass
Classification of a search query by type.
ScoreSource
Which search backend produced a result.
SearchError
Unified error type covering all failure modes across the frankensearch search pipeline.
SearchMode
Search mode selector.
SearchPhase
Progressive search phases for three-tier display.
TwoTierAvailability
Availability classification for two-tier search.

Constants§

DEFAULT_SEMANTIC_CHANGE_THRESHOLD
Default semantic-change threshold used by DocumentFingerprint::needs_reembedding_default.
SIGNIFICANT_CHAR_COUNT_CHANGE_THRESHOLD
Character-count change ratio that always triggers re-embedding.

Traits§

Canonicalizer
Trait for text preprocessing before embedding.
DaemonClient
Abstract daemon client.
Embedder
Core trait for text embedding models.
LexicalSearch
Trait for full-text lexical search backends.
MetricsExporter
Trait for exporting search/index/embed telemetry to external consumers.
Reranker
Core trait for cross-encoder reranking models.
SyncEmbed
Synchronous embedding interface for host projects that call embedders from non-async contexts.
SyncLexicalSearch
Optional synchronous lexical backend used by SyncTwoTierSearcher.
SyncRerank
Synchronous reranking interface for host projects that call rerankers from non-async contexts.

Functions§

blend_two_tier
Blend fast-tier and quality-tier vector hits into a single ranking.
bootstrap_ci
Compute a bootstrap confidence interval for the mean of scores.
bootstrap_compare
Compare two paired score distributions via bootstrap.
candidate_count
Compute how many candidates to fetch from each source.
cosine_similarity
Computes cosine similarity between two vectors.
l2_normalize
L2-normalizes a vector to unit length.
map_at_k
Mean Average Precision at K.
mrr
Mean Reciprocal Rank.
ndcg_at_k
Normalized Discounted Cumulative Gain at K.
quality_comparison
Produce a multi-metric quality comparison report using paired bootstrap tests.
recall_at_k
Recall at K.
rrf_fuse
Fuse lexical and semantic search results using Reciprocal Rank Fusion.
truncate_embedding
Truncates an embedding to a target dimension and re-normalizes.

Type Aliases§

SearchFuture
Boxed future carrying a SearchResult<T>.
SearchResult
Convenience alias used throughout the frankensearch crate hierarchy.
SharedMetricsExporter
Shared handle for dynamic telemetry exporters.