Aurora Semantic
A local, embedded semantic search engine for source code, designed to be bundled directly inside desktop IDEs.
Pure Rust implementation using ONNX Runtime for embedding inference - no Python required.
Features
- 🔍 Hybrid Search - Combines lexical (keyword) and semantic (AI) search
- 🚀 Fast Indexing - Parallel file processing with progress reporting
- 💾 Persistent Indexes - Save/reload indexes efficiently
- 🎯 Smart Chunking - Extracts functions, classes, structs by language
- 📁 Ignore Rules - Respects .gitignore and custom patterns
- 🔌 Extensible - Trait-based design for custom embedders
- 🦀 Pure Rust - No Python, uses ONNX Runtime
Quick Start
Installation
# Clone and build
# Install CLI
CLI Usage
# Index a codebase (uses hash embeddings by default)
# Index with an ONNX model for semantic search
# Search
# Search with options
# List indexed workspaces
# Show statistics
# Delete a workspace
Library Usage
use ;
use PathBuf;
async
Using ONNX Models
Aurora uses ONNX Runtime for local model inference. You need to download models yourself.
Recommended Models
| Model | Dimension | Use Case | Download |
|---|---|---|---|
jina-embeddings-v2-base-code |
768 | Code search (30 languages) | HuggingFace |
all-MiniLM-L6-v2 |
384 | General text | HuggingFace |
bge-small-en-v1.5 |
384 | English text | HuggingFace |
Model Directory Structure
models/jina-code/
├── model.onnx # ONNX model file
├── tokenizer.json # HuggingFace tokenizer
└── config.json # Optional: model config
Downloading Models
Option 1: Download pre-exported ONNX models directly
# Jina Code model (recommended for code search)
Option 2: Export with optimum
# Export Jina Code model
# Export sentence-transformers model
Loading Models in Code
use ;
// Simple: load from directory
let embedder = from_directory?;
// Advanced: custom settings
let embedder = from_directory
.with_max_length // Jina supports up to 8192 tokens
.with_dimension
.load?;
Project Structure
aurora-semantic/
├── src/
│ ├── lib.rs # Public API exports
│ ├── types.rs # Core types (Document, Chunk, SearchResult)
│ ├── config.rs # Configuration types
│ ├── error.rs # Error types
│ ├── bin/
│ │ └── aurora.rs # CLI binary
│ ├── chunker/
│ │ ├── mod.rs # Code chunking trait + default
│ │ └── strategies.rs # Chunking strategies
│ ├── embeddings/
│ │ ├── mod.rs # Embedder trait
│ │ ├── providers.rs # ONNX + Hash embedders
│ │ └── pooling.rs # Pooling strategies
│ ├── search/
│ │ ├── mod.rs # Search coordination
│ │ ├── lexical.rs # Tantivy-based search
│ │ ├── semantic.rs # Vector similarity search
│ │ └── query.rs # Query types + filters
│ ├── storage/
│ │ ├── mod.rs # Storage trait
│ │ ├── disk.rs # Disk persistence
│ │ └── metadata.rs # Workspace metadata
│ ├── ignore/
│ │ └── mod.rs # File filtering
│ └── engine/
│ └── mod.rs # Main Engine API
├── Cargo.toml
└── README.md
Configuration
EngineConfig
let config = new
.with_chunking
.with_search
.with_ignore;
Search Modes
- Lexical - Keyword-based search using Tantivy (fast, exact matches)
- Semantic - Embedding similarity search (understands meaning)
- Hybrid - Combines both with configurable weights (default)
use ;
let query = new
.mode
.limit
.min_score
.filter;
Supported Languages
Aurora extracts semantic chunks (functions, classes, etc.) from:
- Rust
- Python
- JavaScript / TypeScript
- Go
- Java
- C / C++
Other languages use generic line-based chunking.
Environment Variables
| Variable | Description | Default |
|---|---|---|
AURORA_MODEL_PATH |
Path to ONNX model directory | None |
RUST_LOG |
Logging level | info |
Developer Integration
Implementing Custom Embedder
use ;
// Use it
let engine = with_embedder?;
Progress Reporting
let progress_callback = Boxnew;
engine.index_workspace.await?;
GPU Acceleration
Aurora supports GPU acceleration via ONNX Runtime execution providers:
| Feature | Platform | Requirements |
|---|---|---|
cuda |
NVIDIA GPUs | CUDA 11.x/12.x toolkit |
tensorrt |
NVIDIA GPUs | TensorRT 8.x |
directml |
Windows (AMD/Intel/NVIDIA) | DirectX 12 |
coreml |
macOS (Apple Silicon) | macOS 11+ |
Build with GPU support:
# NVIDIA CUDA
# Windows DirectML (works with any GPU)
# macOS Apple Silicon
The embedder automatically uses GPU when available, falling back to CPU otherwise.
Performance
- Indexing: ~1000 files/second (without embeddings), ~100 files/second (with ONNX on CPU)
- Indexing with GPU: ~500+ files/second (with ONNX on GPU)
- Search: <10ms for lexical, <50ms for semantic
- Memory: ~100MB base + ~1KB per indexed chunk
- Disk: ~2x source size for full index
License
MIT
Contributing
Contributions welcome! Please read CONTRIBUTING.md first.