Crate aurora_semantic

Crate aurora_semantic 

Source
Expand description

§Aurora Semantic

A local, embedded semantic search engine for source code, designed to be bundled directly inside desktop IDEs.

§Features

  • Workspace Indexing: Index entire codebases with progress reporting
  • Smart Chunking: Extract meaningful code segments (functions, classes, etc.)
  • Ignore Rules: Respect .gitignore and custom patterns
  • Persistent Indexes: Save and reload indexes efficiently
  • Lexical Search: Fast keyword-based search using Tantivy
  • Semantic Search: ONNX-based embedding similarity search
  • Hybrid Search: Combined lexical and semantic search

§Quick Start

use aurora_semantic::{Engine, EngineConfig, WorkspaceConfig, SearchQuery, ModelConfig};
use std::path::PathBuf;

#[tokio::main]
async fn main() -> aurora_semantic::Result<()> {
    // Load your ONNX model
    let model = ModelConfig::from_directory("./models/jina-code").load()?;

    // Create engine
    let config = EngineConfig::new(PathBuf::from(".aurora"));
    let engine = Engine::with_embedder(config, model)?;

    // Index a workspace
    let ws_config = WorkspaceConfig::new(PathBuf::from("./my-project"));
    let workspace_id = engine.index_workspace(ws_config, None).await?;

    // Search for code
    let results = engine.search_text(&workspace_id, "authentication")?;

    for result in results {
        println!("{}: {} (score: {:.2})",
            result.document.relative_path.display(),
            result.chunk.symbol_name.as_deref().unwrap_or("unknown"),
            result.score
        );
    }

    Ok(())
}

§Using Your Own ONNX Model

Aurora uses ONNX Runtime for embedding generation. To use semantic search:

  1. Download an ONNX model (e.g., jina-embeddings-v2-base-code)
  2. Place model.onnx and tokenizer.json in a directory
  3. Point Aurora to that directory
use aurora_semantic::{ModelConfig, OnnxEmbedder};

// Load from directory
let embedder = OnnxEmbedder::from_directory("./models/jina-code")?;

// Or with custom settings
let embedder = ModelConfig::from_directory("./models/jina-code")
    .with_max_length(8192)  // Jina supports 8k context
    .load()?;

Structs§

Chunk
A chunk of source code extracted from a document.
ChunkId
Unique identifier for a chunk within a document.
ChunkingConfig
Configuration for code chunking.
DefaultChunker
Default chunker using regex-based parsing.
DiskStorage
File-based storage implementation.
Document
Represents a source code document.
DocumentId
Unique identifier for a document (source file).
EmbeddingConfig
Configuration for embedding generation.
Engine
The main semantic search engine.
EngineConfig
Main configuration for the semantic search engine.
ExecutionProviderInfo
Information about the execution provider (CPU/GPU) being used.
FileFilter
File filter for determining which files to index.
FileWalker
Walk a directory and yield files that should be indexed.
HashEmbedder
Simple hash-based embedder for testing (no model required).
Highlight
A highlighted portion of text showing a match.
IgnoreConfig
Configuration for ignore patterns.
IndexProgress
Progress information during indexing.
JinaCodeConfig
Configuration for loading a Jina Code Embeddings model.
JinaCodeEmbedder
Jina Code Embeddings 1.5B specialized embedder.
LanguageStats
Statistics for a single language in a workspace.
ModelConfig
Configuration for loading an embedding model.
OnnxEmbedder
ONNX-based embedding model using ONNX Runtime.
PerformanceConfig
Performance tuning configuration.
SearchConfig
Configuration for search behavior.
SearchFilter
Filters to apply to search results.
SearchQuery
A search query with options.
SearchResult
A search result with relevance score.
WorkspaceConfig
Configuration for a specific workspace.
WorkspaceId
Unique identifier for a workspace.
WorkspaceMetadata
Metadata about a workspace index.
WorkspaceStats
Statistics about an indexed workspace.

Enums§

ChunkType
Type of code chunk.
EmbeddingMode
Controls whether embed() uses query or passage instruction prefix.
EmbeddingTask
Embedding task types for Jina Code Embeddings 1.5B.
Error
Main error type for the aurora-semantic crate.
IndexPhase
Phases of the indexing process.
Language
Supported programming languages.
MatchType
Type of match that produced a search result.
MatryoshkaDimension
Matryoshka embedding dimensions supported by Jina Code 1.5B.
PoolingStrategy
Strategy for pooling token embeddings into a single vector.
SearchMode
Search mode selection.

Constants§

VERSION
Library version.

Traits§

Chunker
Trait for code chunking implementations.
Embedder
Trait for embedding generators.
Storage
Trait for storage backends.

Type Aliases§

Result
Result type alias using our Error.