Expand description
§Aurora Semantic
A local, embedded semantic search engine for source code, designed to be bundled directly inside desktop IDEs.
§Features
- Workspace Indexing: Index entire codebases with progress reporting
- Smart Chunking: Extract meaningful code segments (functions, classes, etc.)
- Ignore Rules: Respect .gitignore and custom patterns
- Persistent Indexes: Save and reload indexes efficiently
- Lexical Search: Fast keyword-based search using Tantivy
- Semantic Search: ONNX-based embedding similarity search
- Hybrid Search: Combined lexical and semantic search
§Quick Start
ⓘ
use aurora_semantic::{Engine, EngineConfig, WorkspaceConfig, SearchQuery, ModelConfig};
use std::path::PathBuf;
#[tokio::main]
async fn main() -> aurora_semantic::Result<()> {
// Load your ONNX model
let model = ModelConfig::from_directory("./models/jina-code").load()?;
// Create engine
let config = EngineConfig::new(PathBuf::from(".aurora"));
let engine = Engine::with_embedder(config, model)?;
// Index a workspace
let ws_config = WorkspaceConfig::new(PathBuf::from("./my-project"));
let workspace_id = engine.index_workspace(ws_config, None).await?;
// Search for code
let results = engine.search_text(&workspace_id, "authentication")?;
for result in results {
println!("{}: {} (score: {:.2})",
result.document.relative_path.display(),
result.chunk.symbol_name.as_deref().unwrap_or("unknown"),
result.score
);
}
Ok(())
}§Using Your Own ONNX Model
Aurora uses ONNX Runtime for embedding generation. To use semantic search:
- Download an ONNX model (e.g.,
jina-embeddings-v2-base-code) - Place
model.onnxandtokenizer.jsonin a directory - Point Aurora to that directory
ⓘ
use aurora_semantic::{ModelConfig, OnnxEmbedder};
// Load from directory
let embedder = OnnxEmbedder::from_directory("./models/jina-code")?;
// Or with custom settings
let embedder = ModelConfig::from_directory("./models/jina-code")
.with_max_length(8192) // Jina supports 8k context
.load()?;Structs§
- Chunk
- A chunk of source code extracted from a document.
- ChunkId
- Unique identifier for a chunk within a document.
- Chunking
Config - Configuration for code chunking.
- Default
Chunker - Default chunker using regex-based parsing.
- Disk
Storage - File-based storage implementation.
- Document
- Represents a source code document.
- Document
Id - Unique identifier for a document (source file).
- Embedding
Config - Configuration for embedding generation.
- Engine
- The main semantic search engine.
- Engine
Config - Main configuration for the semantic search engine.
- Execution
Provider Info - Information about the execution provider (CPU/GPU) being used.
- File
Filter - File filter for determining which files to index.
- File
Walker - Walk a directory and yield files that should be indexed.
- Hash
Embedder - Simple hash-based embedder for testing (no model required).
- Highlight
- A highlighted portion of text showing a match.
- Ignore
Config - Configuration for ignore patterns.
- Index
Progress - Progress information during indexing.
- Jina
Code Config - Configuration for loading a Jina Code Embeddings model.
- Jina
Code Embedder - Jina Code Embeddings 1.5B specialized embedder.
- Language
Stats - Statistics for a single language in a workspace.
- Model
Config - Configuration for loading an embedding model.
- Onnx
Embedder - ONNX-based embedding model using ONNX Runtime.
- Performance
Config - Performance tuning configuration.
- Search
Config - Configuration for search behavior.
- Search
Filter - Filters to apply to search results.
- Search
Query - A search query with options.
- Search
Result - A search result with relevance score.
- Workspace
Config - Configuration for a specific workspace.
- Workspace
Id - Unique identifier for a workspace.
- Workspace
Metadata - Metadata about a workspace index.
- Workspace
Stats - Statistics about an indexed workspace.
Enums§
- Chunk
Type - Type of code chunk.
- Embedding
Mode - Controls whether
embed()uses query or passage instruction prefix. - Embedding
Task - Embedding task types for Jina Code Embeddings 1.5B.
- Error
- Main error type for the aurora-semantic crate.
- Index
Phase - Phases of the indexing process.
- Language
- Supported programming languages.
- Match
Type - Type of match that produced a search result.
- Matryoshka
Dimension - Matryoshka embedding dimensions supported by Jina Code 1.5B.
- Pooling
Strategy - Strategy for pooling token embeddings into a single vector.
- Search
Mode - Search mode selection.
Constants§
- VERSION
- Library version.
Traits§
- Chunker
- Trait for code chunking implementations.
- Embedder
- Trait for embedding generators.
- Storage
- Trait for storage backends.
Type Aliases§
- Result
- Result type alias using our Error.