Skip to main content

Crate apohara_indexer

Crate apohara_indexer 

Source
Expand description

Apohara code indexer (soft-fork): sqlite-vec storage + blake3 feature-hashing embeddings + tree-sitter parsing.

This crate is a LIB-ONLY soft-fork of apohara-indexer from SuarezPM/Apohara-Catalyst. The storage, parser, and embeddings modules are preserved verbatim; the binary entry point (main.rs) is intentionally dropped.

The new engine modules land in subsequent steps. Step 1 adds walker (gitignore-aware traversal) and chunker (symbol/module/window chunking); schema/search/incremental follow later.

Re-exports§

pub use storage::ensure_vec_extension_registered;
pub use storage::insert_chunk_full;
pub use storage::insert_chunk_full_with;
pub use storage::knn_query;
pub use storage::knn_query_with;
pub use storage::open_db;
pub use storage::open_db_with;
pub use storage::write_file_structural;
pub use storage::IndexedChunk;
pub use storage::KnnHit;
pub use storage::SymbolData;
pub use storage::EMBED_DIM;
pub use embedder::active_embedder;
pub use embedder::resolve_embedder_choice;
pub use embedder::Embedder;
pub use embedder::EmbedderChoice;
pub use embedder::FeatureHashEmbedder;
pub use embedder::EMBED_MODEL_ENV;
pub use embedder::FEATURE_HASH_ID;
pub use registry::load as load_registry;
pub use registry::register;
pub use registry::registry_path;
pub use registry::save as save_registry;
pub use registry::Registry;
pub use schema::migrate;
pub use schema::read_embedder_meta;
pub use schema::verify_embedder_meta;
pub use schema::write_embedder_meta;
pub use schema::META_EMBEDDER_DIM;
pub use schema::META_EMBEDDER_ID;
pub use schema::META_SCHEMA_VERSION;
pub use schema::MIGRATION_PLACEHOLDER_REPO_ID;
pub use schema::SCHEMA_VERSION;
pub use search::apply_structural_boost;
pub use search::bm25_query;
pub use search::classify_query_weights;
pub use search::dedup_content;
pub use search::dedup_overlapping;
pub use search::hydrate;
pub use search::load_embeddings;
pub use search::mmr_rerank;
pub use search::resolve_weights;
pub use search::rrf_fuse;
pub use search::rrf_fuse_weighted;
pub use search::vector_query;
pub use search::vector_query_with;
pub use search::ExportRow;
pub use search::HydratedHit;
pub use search::ImportRow;
pub use search::MMR_LAMBDA;
pub use search::RRF_K;
pub use search::STRUCTURAL_BOOST;
pub use tokens::code_tokens;
pub use embeddings::feature_hash_embed;
pub use parser::detect_language;
pub use parser::parse_file;
pub use parser::parse_imports_exports;
pub use parser::parse_source;
pub use parser::parse_source_imports_exports;
pub use parser::parse_source_spans;
pub use parser::ExportStatement;
pub use parser::FunctionSignature;
pub use parser::ImportStatement;
pub use parser::Language;
pub use parser::SymbolKind;
pub use walker::walk_repo;
pub use walker::WalkedFile;
pub use chunker::chunk_file;
pub use chunker::chunk_id;
pub use chunker::ChunkKind;
pub use chunker::ChunkSpec;
pub use incremental::index_repo;
pub use incremental::index_repo_with;
pub use incremental::reindex;
pub use incremental::reindex_with;
pub use incremental::ReindexReport;

Modules§

chunker
Splits file contents into indexable chunks.
embedder
Pluggable embedding behind the Embedder trait.
embeddings
Deterministic feature-hashing embeddings (blake3-based).
incremental
Incremental (and full) reindex engine.
parser
registry
Multi-repo sidecar registry: a path -> index.db map (Decision E1).
schema
Idempotent schema migration run on every open AFTER crate::storage::open_db.
search
Query-time retrieval: lexical (BM25 over FTS5), vector (sqlite-vec KNN), Reciprocal Rank Fusion of the two, and hydration of a fused hit into a displayable record.
storage
sqlite-vec backed storage for code chunks + their embeddings.
tokens
Code-aware tokenizer shared by the index-time and query-time paths.
walker
Gitignore-aware filesystem traversal for indexing.

Functions§

sqlite_version
Bundled SQLite version string (e.g. "3.46.0").