skanda_engine 0.1.0

A zero-dependency, ultra-high-performance retrieval engine designed for the next generation of RAG.
Documentation

🚀 The Paradigm Shift

Current Retrieval-Augmented Generation (RAG) systems heavily rely on Vector Embeddings. While useful, embeddings are computationally expensive to generate, lose exact syntactic detail during translation, and struggle with massive scale (e.g., billions of lines of code or text).

Skanda flips the model on its head by leveraging the LLM's own internal knowledge before retrieval.

  1. 🧠 Predict: Instead of blindly embedding your entire corpus, we ask the LLM to predict the exact syntactic "footprints" (rare words, code patterns, specific function names, or phrases) that would logically appear near the answer.
  2. ⚡ Retrieve: Skanda’s Rust-based engine takes these footprints and performs massively parallel, SIMD-accelerated exact searches across billions of lines in mere milliseconds.
  3. 🎯 Rank: Results are dynamically ranked using IDF-weighted Positional Proximity (via BitSet-based dilation) to ensure the LLM receives the densest, most contextually rich clusters of relevant information.

✨ Key Features

  • Minimal Dependencies: Built heavily on the Rust standard library, using highly targeted crates like Rayon for multi-threading and Serde for API responses. No Tokio runtime overhead. Just raw performance.
  • Massive Scale: Employs efficient Varint + Delta encoding to keep indices remarkably small and fully in-memory.
  • Blistering Speed: Leverages SIMD-accelerated (AVX2/SSE2) exact matching for near-instantaneous retrieval.
  • Hyper Accuracy: Uses advanced IDF-weighted proximity scoring to discover the most contextually relevant snippets across your data.

🛠️ Quickstart Guide

1. Build the Engine

Compile the binary for maximum performance:

cargo build --release

2. Index Your Data

Create a highly compressed index of your target directory:

./skanda index <directory_to_index> index.bin

3. Search (with Typo Tolerance)

Execute a rapid search for your predicted footprints.

./skanda search index.bin "your predicted footprints" --fuzzy

Note: The --fuzzy flag enables Levenshtein-tolerant expansion. If your LLM predicts token_type_embeddings but the actual corpus uses token_type_ids, Skanda's fuzzy matching will still locate the correct context.

4. Deploy the JSON API (Bridge)

Expose the Skanda engine as a lightweight HTTP microservice:

./skanda serve index.bin 8080

Query the API directly:

curl "http://localhost:8080/search?q=footprint1+footprint2&fuzzy=true"


🤖 LLM Integration

Supercharge your agents by giving your LLM direct access to Skanda. Simply provide the following tool definition:

{
  "name": "skanda_search",
  "description": "Search billions of lines of text in milliseconds by predicting syntactic footprints (rare words or patterns).",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Space-separated rare words or specific code/text patterns likely to appear near the answer."
      }
    },
    "required": ["query"]
  }
}

🚀 Benchmarks

We constantly monitor performance regressions using criterion. Based on testing with Alice's Adventures in Wonderland (1 full novel text file), Skanda Engine remains blazing fast even after adding robust safeguards:

Operation Query/Task Time per Iteration
Indexing Full File parsing, chunking, and writing to disk ~8.39 ms (milliseconds)
Exact Search Query: "rabbit hole" ~65.25 µs (microseconds)
Fuzzy Search Query: "rabbit hole", fuzzy match enabled ~214.98 µs (microseconds)

These results establish a very solid baseline showing zero significant performance degradation after fixing the memory-mapped loading and SIMD behaviors.


📜 License

This project is licensed under the MIT License.