lethe-core-rust 0.1.0

High-performance hybrid retrieval engine combining BM25 lexical search with vector similarity using z-score fusion. Features hero configuration for optimal parity with splade baseline, gamma boosting for code/error contexts, and comprehensive chunking pipeline.
Documentation

lethe-core-rust

Crates.io Documentation License: MIT

A high-performance hybrid retrieval engine that combines BM25 lexical search with vector similarity using z-score fusion. Lethe Core provides state-of-the-art context selection for conversational AI and retrieval-augmented generation (RAG) systems.

Features

  • Hybrid Retrieval: Combines BM25 lexical search with vector similarity for optimal relevance
  • Z-Score Fusion: Normalizes and fuses scores using statistical z-score transformation (α=0.5, β=0.5)
  • Hero Configuration: Pre-tuned parameters achieving parity with splade baseline performance
  • Gamma Boosting: Context-aware score boosting for code, errors, and technical content
  • Chunking Pipeline: Intelligent text segmentation with sentence-level granularity
  • Async-First: Built on Tokio for high-performance concurrent operations
  • Extensible: Modular architecture supporting custom embedding services and repositories

Quick Start

Add this to your Cargo.toml:

[dependencies]
lethe-core-rust = "0.1.0"

Basic Usage

use lethe_core_rust::{get_hero_config, apply_zscore_fusion, Candidate};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Get the hero configuration (optimal for splade parity)
    let config = get_hero_config();
    println!("Hero config: α={}, β={}, k_final={}", 
             config.alpha, config.beta, config.k_final);
    
    // Example BM25 candidates
    let bm25_candidates = vec![
        Candidate {
            doc_id: "doc1".to_string(),
            score: 0.8,
            text: Some("Rust async programming tutorial".to_string()),
            kind: Some("bm25".to_string()),
        },
        Candidate {
            doc_id: "doc2".to_string(), 
            score: 0.6,
            text: Some("Python async examples".to_string()),
            kind: Some("bm25".to_string()),
        },
    ];
    
    // Example vector candidates
    let vector_candidates = vec![
        Candidate {
            doc_id: "doc1".to_string(),
            score: 0.9,
            text: Some("Rust async programming tutorial".to_string()),
            kind: Some("vector".to_string()),
        },
        Candidate {
            doc_id: "doc3".to_string(),
            score: 0.7,
            text: Some("Async programming concepts".to_string()),
            kind: Some("vector".to_string()),
        },
    ];
    
    // Apply z-score fusion with hero configuration (α=0.5)
    let results = apply_zscore_fusion(bm25_candidates, vector_candidates, 0.5);
    
    println!("Hybrid results:");
    for (i, result) in results.iter().enumerate() {
        println!("{}. {} (score: {:.3})", i + 1, result.doc_id, result.score);
    }
    
    Ok(())
}

Advanced Usage with Full Pipeline

use lethe_core_rust::{
    HybridRetrievalService, HybridRetrievalConfig, 
    ChunkingService, ChunkingConfig
};
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create chunking service
    let chunking_config = ChunkingConfig::default();
    let chunking_service = ChunkingService::new(chunking_config);
    
    // Chunk some text
    let text = "This is a sample document about Rust programming. \
                Rust is a systems programming language. \
                It provides memory safety without garbage collection.";
                
    let chunks = chunking_service.chunk_text(
        text,
        "session-123", 
        uuid::Uuid::new_v4(),
        0
    ).await?;
    
    println!("Created {} chunks", chunks.len());
    
    // Set up hybrid retrieval with hero configuration
    let config = HybridRetrievalConfig::hero();
    let service = HybridRetrievalService::mock_for_testing();
    
    println!("Hero configuration loaded:");
    println!("  α (BM25 weight): {}", config.alpha);
    println!("  β (Vector weight): {}", config.beta);
    println!("  k_initial (pool size): {}", config.k_initial);
    println!("  k_final (results): {}", config.k_final);
    println!("  Diversification: {}", config.diversify_method);
    
    Ok(())
}

Z-Score Fusion Algorithm

Lethe Core implements a sophisticated z-score fusion algorithm that normalizes and combines scores from different retrieval methods:

  1. Score Normalization: BM25 scores are normalized to [0,1], cosine similarities from [-1,1] to [0,1]
  2. Z-Score Calculation: Each score set is converted to z-scores (mean=0, std=1)
  3. Weighted Fusion: Combined using hybrid_score = α * z_bm25 + β * z_vector
  4. Gamma Boosting: Context-aware multipliers for code, error, and technical content

Hero Configuration

The hero configuration provides optimal parameters validated against splade baseline performance:

  • α = 0.5, β = 0.5: Equal weighting of lexical and semantic signals
  • k_initial = 200: Large candidate pool for comprehensive coverage
  • k_final = 5: Focused results optimized for Recall@5 metrics
  • Diversification = "splade": Advanced diversification matching baseline method
  • Gamma boosting disabled: Clean z-score fusion without latent multipliers

Performance Characteristics

  • Concurrent Retrieval: BM25 and vector search run in parallel using tokio::try_join!
  • Memory Efficient: Streaming chunking and lazy evaluation
  • Scalable: Handles large document collections with efficient indexing
  • Fast Fusion: O(n) z-score calculation and fusion
  • Low Latency: Sub-millisecond fusion for typical candidate sets

Feature Flags

[dependencies]
lethe-core-rust = { version = "0.1.0", features = ["ollama"] }
  • default: Basic functionality with fallback embedding service
  • fallback: Fallback embedding service for testing (included in default)
  • ollama: Integration with Ollama embedding service

Architecture

Lethe Core follows a modular architecture:

  • lethe-shared: Common types, errors, and utilities
  • lethe-domain: Core business logic and services
  • lethe-infrastructure: External integrations and adapters

Key Types

  • Candidate: Search result with document ID, score, and metadata
  • HybridRetrievalConfig: Configuration for retrieval parameters
  • HybridRetrievalService: Main service orchestrating hybrid search
  • Chunk: Text segment with tokenization and metadata
  • EmbeddingVector: Vector representation with similarity operations

Examples

The repository includes several examples:

  • Basic Usage: Simple z-score fusion example
  • Hero Configuration: Using pre-tuned optimal parameters
  • Custom Config: Advanced configuration with custom parameters
  • Full Pipeline: End-to-end chunking and retrieval

Testing

Run the test suite:

cargo test

Run with logging enabled:

RUST_LOG=debug cargo test -- --nocapture

Golden Parity Snapshots

The hero configuration has been validated against golden parity snapshots ensuring consistent performance with the TypeScript implementation. Key validation points:

  • Z-score mathematical properties (mean ≈ 0, std ≈ 1)
  • Fusion calculation accuracy to 10 decimal places
  • Candidate ordering and scoring consistency
  • Performance parity with splade baseline

Contributing

Contributions are welcome! Please ensure:

  1. All tests pass: cargo test
  2. Code is formatted: cargo fmt
  3. Linting passes: cargo clippy
  4. Documentation builds: cargo doc --no-deps

License

This project is licensed under the MIT License - see the LICENSE file for details.

Related Projects

  • Lethe - The main Lethe project
  • SPLADE - Sparse lexical and expansion model baseline

Citation

If you use Lethe Core in your research, please cite:

@software{lethe_core_rust,
  title={Lethe Core: High-Performance Hybrid Retrieval with Z-Score Fusion},
  author={Nathan Rice},
  year={2024},
  url={https://github.com/nrice/lethe}
}