Oak Semantic Search
Advanced AI-powered semantic search for source code, leveraging AST-aware chunking and vector embeddings.
🎯 Overview
Oak Semantic Search goes beyond traditional keyword search by understanding the structure and meaning of your code. It uses oak-core to intelligently chunk source code into meaningful units (like functions, classes, and documentation) and indexes them using state-of-the-art embedding models and vector databases.
✨ Features
- AST-Aware Chunking: Intelligently splits code based on logical boundaries (Definitions, Statements, etc.) rather than simple line counts.
- Embedding Integration: Built-in support for
fastembedto generate high-quality vector representations of code. - Vector DB Support: Designed to work with
vectordb(LanceDB) for efficient similarity search. - Contextual Search: Find code by describing its functionality in natural language.
- Role-Based Indexing: Prioritizes definitions and documentation for better search relevance.
🚀 Quick Start
Basic usage of the SemanticSearcher:
use SemanticSearcher;
async
📋 Examples
Intelligent Code Chunking
The library uses a ChunkCollector to extract meaningful pieces of code:
// Internally, it identifies nodes with roles like:
// - UniversalElementRole::Definition
// - UniversalElementRole::Statement
// - UniversalElementRole::Documentation
🔧 Advanced Features
Custom Embedding Models
Oak Semantic Search leverages fastembed, allowing you to choose from various pre-trained models optimized for code or general text.
Integration with MCP
The library implements the SemanticSearch trait, making it compatible with the Model Context Protocol (MCP) for AI agent integration.
🏗️ Integration
- Oak MCP: Powers the semantic search tool in AI-assisted coding environments.
- Documentation Portals: Enhances documentation with "search by meaning" capabilities.
- Code Discovery: Helps developers find relevant code patterns in large monorepos.
📊 Performance
- Fast Indexing: Concurrent embedding generation for high throughput.
- Scalable Search: Vector-based retrieval remains fast even with millions of code chunks.
- Efficient Storage: Optimized vector storage with minimal disk footprint.
🤝 Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Oak Semantic Search - Understanding the meaning behind the code 🚀