Expand description
§frankensearch
Two-tier hybrid search for Rust: sub-millisecond initial results, quality-refined rankings in ~150ms.
frankensearch combines lexical (Tantivy BM25) and semantic (vector cosine similarity) search via Reciprocal Rank Fusion, with a two-tier progressive embedding model that delivers results in two phases:
- Phase 1 (Initial): Fast embedder (potion-128M, 256d, ~0.57ms) produces results immediately via brute-force vector search + optional BM25 fusion.
- Phase 2 (Refined): Quality embedder (MiniLM-L6-v2, 384d, ~128ms) re-scores the top candidates for higher relevance.
Consumers receive results progressively via SearchPhase callbacks, so UIs
can display fast results while quality refinement runs in the background.
§Quick Start
Build an index and search it (requires only the default hash feature):
use std::sync::Arc;
use frankensearch::prelude::*;
use frankensearch::{EmbedderStack, HashEmbedder, IndexBuilder, TwoTierIndex};
use frankensearch_core::traits::Embedder;
asupersync::test_utils::run_test_with_cx(|cx| async move {
// Build an index
let fast = Arc::new(HashEmbedder::default_256()) as Arc<dyn Embedder>;
let quality = Arc::new(HashEmbedder::default_384()) as Arc<dyn Embedder>;
let stack = EmbedderStack::from_parts(fast, Some(quality));
let stats = IndexBuilder::new("./my_index")
.with_embedder_stack(stack)
.add_document("doc-1", "Rust ownership and borrowing")
.add_document("doc-2", "Python garbage collection")
.build(&cx)
.await
.expect("build index");
// Search
let fast = Arc::new(HashEmbedder::default_256()) as Arc<dyn Embedder>;
let index = Arc::new(TwoTierIndex::open("./my_index", TwoTierConfig::default()).unwrap());
let searcher = TwoTierSearcher::new(index, fast, TwoTierConfig::default());
let (results, metrics) = searcher
.search_collect(&cx, "memory management", 10)
.await
.expect("search");
for result in &results {
println!("{}: {:.4}", result.doc_id, result.score);
}
});§Architecture
Query ─┬─► Fast Embed (256d) ─► Vector Search ─┐
│ ├─► RRF Fusion ─► Phase 1 Results
└─► Tantivy BM25 (optional) ─────────────┘
│
Quality Embed (384d)
│
Score Blend
│
Phase 2 Results§Crate Layout
| Crate | Purpose |
|---|---|
frankensearch-core | Types, traits, errors, config |
frankensearch-embed | Embedder implementations (hash, model2vec, fastembed) |
frankensearch-index | FSVI vector index format, brute-force + HNSW search |
frankensearch-fusion | RRF fusion, blending, TwoTierSearcher orchestration |
frankensearch-lexical | Tantivy BM25 backend (feature-gated) |
frankensearch-rerank | FlashRank cross-encoder (feature-gated) |
§Key Types
IndexBuilder— Build a search index from documentsTwoTierSearcher— Progressive two-phase search orchestratorTwoTierConfig— Search configuration (blend factor, budgets, fast-only mode)TwoTierMetrics— Per-search timing and diagnostic metricsSearchPhase— Progressive result delivery (Initial / Refined /RefinementFailed)EmbedderStack— Fast + optional quality embedder pairVectorIndex— Low-level FSVI vector index reader
§Performance
Measured on a single core (no GPU), 10K document corpus:
| Operation | Embedder | Latency |
|---|---|---|
| Hash embed (256d) | FNV-1a | ~11 μs |
| Fast embed (256d) | potion-128M | ~0.57 ms |
| Quality embed (384d) | MiniLM-L6-v2 | ~128 ms |
| Vector search (10K, top-10) | brute-force | ~2 ms |
| RRF fusion (500+500) | - | ~1 ms |
| Full pipeline (hash, 10K) | hash only | ~3 ms |
§Feature Flags
| Feature | Description |
|---|---|
hash | FNV-1a hash embedder (default, zero dependencies) |
model2vec | potion-128M static embedder (fast tier, ~0.57ms) |
fastembed | MiniLM-L6-v2 ONNX embedder (quality tier, ~128ms) |
lexical | Tantivy BM25 full-text search |
rerank | FlashRank cross-encoder reranking |
ann | HNSW approximate nearest-neighbor index |
download | Model auto-download from HuggingFace via asupersync |
storage | FrankenSQLite document metadata + embedding queue |
durability | RaptorQ self-healing for persistent index artifacts |
fts5 | Enables FrankenSQLite FTS5 lexical backend wiring |
semantic | hash + model2vec + fastembed |
hybrid | semantic + lexical |
persistent | hybrid + storage |
durable | persistent + durability |
full | durable + rerank + ann + download |
full-fts5 | full + fts5 |
§Recommended Feature Combinations
- Development/testing:
default(hash only, no downloads) - Production semantic:
semantic+download - Persistent hybrid search:
persistent - Maximum durability:
durableorfull
§Async Runtime
frankensearch uses asupersync exclusively — not
tokio. All async methods take &Cx (capability context) as their first
parameter. The Cx is provided by the consumer’s asupersync runtime;
frankensearch never creates its own runtime.
Re-exports§
pub use frankensearch_core as core;pub use frankensearch_embed as embed;pub use frankensearch_fusion as fusion;pub use frankensearch_index as index;
Modules§
- prelude
- Convenience re-exports for common usage.
Structs§
- Bootstrap
Ci - Confidence interval estimated via bootstrap resampling.
- Bootstrap
Comparison - Comparison of two paired score distributions via bootstrap.
- Cx
- Capability context for structured concurrency (from asupersync).
- Daemon
Fallback Embedder - Embedder wrapper that uses the daemon when available and falls back to a local embedder.
- Daemon
Fallback Reranker - Reranker wrapper that uses the daemon when available and falls back to a local reranker.
- Daemon
Retry Config - Retry/backoff configuration for daemon requests.
- Default
Canonicalizer - Default canonicalization pipeline.
- DimReduce
Embedder - MRL dimension reduction wrapper.
- Document
Fingerprint - Content-aware fingerprint for deciding whether document embeddings should be refreshed.
- Embedder
Registry - Runtime registry wrapper with configured model-data root.
- Embedder
Stack - Resolved fast/quality embedder stack for progressive search.
- Embedding
Metrics - Structured telemetry for an embedding operation.
- Federated
Config - Configuration for federated search behavior.
- Federated
Hit - A fused hit returned by federated search.
- Federated
Searcher - Multi-index search orchestrator with scatter-gather fusion.
- Fused
Hit - A hit from hybrid fusion (lexical + semantic combined via RRF).
- Hash
Embedder - Zero-dependency hash-based embedder.
- InMemory
TwoTier Index - In-memory two-tier index wrapping fast and optional quality
InMemoryVectorIndex. - InMemory
Vector Index - Fully-resident in-memory vector index with f16 quantization.
- Index
Build Stats - Statistics from a completed index build.
- Index
Builder - Fluent builder for creating frankensearch indexes.
- Index
Metrics - Structured telemetry for index update operations.
- Index
Progress - Progress update during index building.
- Indexable
Document - A document to be indexed for search.
- Model
Info - Static metadata describing an embedder implementation.
- NoOp
Metrics Exporter - No-op exporter used when no telemetry sink is attached.
- Noop
Daemon Client - No-op daemon client used when daemon config is missing.
- Phase
Metrics - Diagnostic metrics for a search phase.
- Quality
Comparison - Multi-metric quality comparison report.
- Quality
Metric Comparison - Comparison result for a single quality metric.
- Quality
Metric Samples - Per-metric paired score samples for quality comparison.
- Rank
Changes - Tracks how rankings changed between initial and refined phases.
- Registered
Embedder - Static embedder metadata entry.
- Rerank
Document - A document for reranking: pairs a document ID with its text content.
- Rerank
Score - A reranking score for a single document.
- RrfConfig
- RRF fusion parameters.
- Scored
Result - The final scored search result delivered to consumers.
- Search
Metrics - Structured telemetry for a completed search request.
- Sync
Embedder Adapter - Adapts a
SyncEmbedimplementor into a full asyncEmbedder. - Sync
Reranker Adapter - Adapts a
SyncRerankimplementor into a full asyncReranker. - Sync
Search Iterator - Iterator over progressive phases produced by
SyncTwoTierSearcher. - Sync
TwoTier Searcher - Progressive synchronous searcher backed by
InMemoryTwoTierIndex. - TwoTier
Config - Configuration for the two-tier progressive search pipeline.
- TwoTier
Index - Dual-index container used by progressive search orchestration.
- TwoTier
Index Builder - Builder for writing fast and optional quality FSVI indices.
- TwoTier
Metrics - Diagnostics from a two-tier search execution.
- TwoTier
Searcher - Progressive two-tier search orchestrator.
- Vector
Hit - A raw hit from vector similarity search.
- Vector
Index - Vector
Index Writer
Enums§
- Daemon
Error - Daemon request failure details.
- Federated
Fusion - Fusion methods supported by
FederatedSearcher. - Hash
Algorithm - Hash algorithm selection for the
HashEmbedder. - Model
Category - Classification of an embedding model by its speed/quality tradeoff.
- Model
Tier - Tier assignment in the progressive two-tier pipeline.
- Quality
Metric - Supported metric kinds for multi-metric quality comparisons.
- Query
Class - Classification of a search query by type.
- Score
Source - Which search backend produced a result.
- Search
Error - Unified error type covering all failure modes across the frankensearch search pipeline.
- Search
Mode - Search mode selector.
- Search
Phase - Progressive search phases for three-tier display.
- TwoTier
Availability - Availability classification for two-tier search.
Constants§
- DEFAULT_
SEMANTIC_ CHANGE_ THRESHOLD - Default semantic-change threshold used by
DocumentFingerprint::needs_reembedding_default. - SIGNIFICANT_
CHAR_ COUNT_ CHANGE_ THRESHOLD - Character-count change ratio that always triggers re-embedding.
Traits§
- Canonicalizer
- Trait for text preprocessing before embedding.
- Daemon
Client - Abstract daemon client.
- Embedder
- Core trait for text embedding models.
- Lexical
Search - Trait for full-text lexical search backends.
- Metrics
Exporter - Trait for exporting search/index/embed telemetry to external consumers.
- Reranker
- Core trait for cross-encoder reranking models.
- Sync
Embed - Synchronous embedding interface for host projects that call embedders from non-async contexts.
- Sync
Lexical Search - Optional synchronous lexical backend used by
SyncTwoTierSearcher. - Sync
Rerank - Synchronous reranking interface for host projects that call rerankers from non-async contexts.
Functions§
- blend_
two_ tier - Blend fast-tier and quality-tier vector hits into a single ranking.
- bootstrap_
ci - Compute a bootstrap confidence interval for the mean of
scores. - bootstrap_
compare - Compare two paired score distributions via bootstrap.
- candidate_
count - Compute how many candidates to fetch from each source.
- cosine_
similarity - Computes cosine similarity between two vectors.
- l2_
normalize - L2-normalizes a vector to unit length.
- map_
at_ k - Mean Average Precision at K.
- mrr
- Mean Reciprocal Rank.
- ndcg_
at_ k - Normalized Discounted Cumulative Gain at K.
- quality_
comparison - Produce a multi-metric quality comparison report using paired bootstrap tests.
- recall_
at_ k - Recall at K.
- rrf_
fuse - Fuse lexical and semantic search results using Reciprocal Rank Fusion.
- truncate_
embedding - Truncates an embedding to a target dimension and re-normalizes.
Type Aliases§
- Search
Future - Boxed future carrying a
SearchResult<T>. - Search
Result - Convenience alias used throughout the frankensearch crate hierarchy.
- Shared
Metrics Exporter - Shared handle for dynamic telemetry exporters.