🚀 The Paradigm Shift
Current Retrieval-Augmented Generation (RAG) systems heavily rely on Vector Embeddings. While useful, embeddings are computationally expensive to generate, lose exact syntactic detail during translation, and struggle with massive scale (e.g., billions of lines of code or text).
Skanda flips the model on its head by leveraging the LLM's own internal knowledge before retrieval.
- 🧠 Predict: Instead of blindly embedding your entire corpus, we ask the LLM to predict the exact syntactic "footprints" (rare words, code patterns, specific function names, or phrases) that would logically appear near the answer.
- ⚡ Retrieve: Skanda’s Rust-based engine takes these footprints and performs massively parallel, SIMD-accelerated exact searches across billions of lines in mere milliseconds.
- 🎯 Rank: Results are dynamically ranked using IDF-weighted Positional Proximity (via BitSet-based dilation) to ensure the LLM receives the densest, most contextually rich clusters of relevant information.
✨ Key Features
- Minimal Dependencies: Built heavily on the Rust standard library, using highly targeted crates like
Rayonfor multi-threading andSerdefor API responses. NoTokioruntime overhead. Just raw performance. - Massive Scale: Employs efficient Varint + Delta encoding to keep indices remarkably small and fully in-memory.
- Blistering Speed: Leverages SIMD-accelerated (AVX2/SSE2) exact matching for near-instantaneous retrieval.
- Hyper Accuracy: Uses advanced IDF-weighted proximity scoring to discover the most contextually relevant snippets across your data.
🛠️ Quickstart Guide
1. Build the Engine
Compile the binary for maximum performance:
2. Index Your Data
Create a highly compressed index of your target directory:
3. Search (with Typo Tolerance)
Execute a rapid search for your predicted footprints.
Note: The --fuzzy flag enables Levenshtein-tolerant expansion. If your LLM predicts token_type_embeddings but the actual corpus uses token_type_ids, Skanda's fuzzy matching will still locate the correct context.
4. Deploy the JSON API (Bridge)
Expose the Skanda engine as a lightweight HTTP microservice:
Query the API directly:
🤖 LLM Integration
Supercharge your agents by giving your LLM direct access to Skanda. Simply provide the following tool definition:
🚀 Benchmarks
We constantly monitor performance regressions using criterion. Based on testing with Alice's Adventures in Wonderland (1 full novel text file), Skanda Engine remains blazing fast even after adding robust safeguards:
| Operation | Query/Task | Time per Iteration |
|---|---|---|
| Indexing | Full File parsing, chunking, and writing to disk | ~8.39 ms (milliseconds) |
| Exact Search | Query: "rabbit hole" |
~65.25 µs (microseconds) |
| Fuzzy Search | Query: "rabbit hole", fuzzy match enabled |
~214.98 µs (microseconds) |
These results establish a very solid baseline showing zero significant performance degradation after fixing the memory-mapped loading and SIMD behaviors.
📜 License
This project is licensed under the MIT License.