Expand description
Apohara code indexer (soft-fork): sqlite-vec storage + blake3 feature-hashing embeddings + tree-sitter parsing.
This crate is a LIB-ONLY soft-fork of apohara-indexer from
SuarezPM/Apohara-Catalyst. The storage, parser, and embeddings modules are
preserved verbatim; the binary entry point (main.rs) is intentionally
dropped.
The new engine modules land in subsequent steps. Step 1 adds walker
(gitignore-aware traversal) and chunker (symbol/module/window chunking);
schema/search/incremental follow later.
Re-exports§
pub use storage::ensure_vec_extension_registered;pub use storage::insert_chunk_full;pub use storage::insert_chunk_full_with;pub use storage::knn_query;pub use storage::knn_query_with;pub use storage::open_db;pub use storage::open_db_with;pub use storage::write_file_structural;pub use storage::IndexedChunk;pub use storage::KnnHit;pub use storage::SymbolData;pub use storage::EMBED_DIM;pub use embedder::active_embedder;pub use embedder::resolve_embedder_choice;pub use embedder::Embedder;pub use embedder::EmbedderChoice;pub use embedder::FeatureHashEmbedder;pub use embedder::EMBED_MODEL_ENV;pub use embedder::FEATURE_HASH_ID;pub use registry::load as load_registry;pub use registry::register;pub use registry::registry_path;pub use registry::save as save_registry;pub use registry::Registry;pub use schema::migrate;pub use schema::read_embedder_meta;pub use schema::verify_embedder_meta;pub use schema::write_embedder_meta;pub use schema::META_EMBEDDER_DIM;pub use schema::META_EMBEDDER_ID;pub use schema::META_SCHEMA_VERSION;pub use schema::MIGRATION_PLACEHOLDER_REPO_ID;pub use schema::SCHEMA_VERSION;pub use search::apply_structural_boost;pub use search::bm25_query;pub use search::classify_query_weights;pub use search::dedup_content;pub use search::dedup_overlapping;pub use search::hydrate;pub use search::load_embeddings;pub use search::mmr_rerank;pub use search::resolve_weights;pub use search::rrf_fuse;pub use search::rrf_fuse_weighted;pub use search::vector_query;pub use search::vector_query_with;pub use search::ExportRow;pub use search::HydratedHit;pub use search::ImportRow;pub use search::MMR_LAMBDA;pub use search::RRF_K;pub use search::STRUCTURAL_BOOST;pub use tokens::code_tokens;pub use embeddings::feature_hash_embed;pub use parser::detect_language;pub use parser::parse_file;pub use parser::parse_imports_exports;pub use parser::parse_source;pub use parser::parse_source_imports_exports;pub use parser::parse_source_spans;pub use parser::ExportStatement;pub use parser::FunctionSignature;pub use parser::ImportStatement;pub use parser::Language;pub use parser::SymbolKind;pub use walker::walk_repo;pub use walker::WalkedFile;pub use chunker::chunk_file;pub use chunker::chunk_id;pub use chunker::ChunkKind;pub use chunker::ChunkSpec;pub use incremental::index_repo;pub use incremental::index_repo_with;pub use incremental::reindex;pub use incremental::reindex_with;pub use incremental::ReindexReport;
Modules§
- chunker
- Splits file contents into indexable chunks.
- embedder
- Pluggable embedding behind the
Embeddertrait. - embeddings
- Deterministic feature-hashing embeddings (blake3-based).
- incremental
- Incremental (and full) reindex engine.
- parser
- registry
- Multi-repo sidecar registry: a
path -> index.dbmap (Decision E1). - schema
- Idempotent schema migration run on every open AFTER
crate::storage::open_db. - search
- Query-time retrieval: lexical (BM25 over FTS5), vector (sqlite-vec KNN), Reciprocal Rank Fusion of the two, and hydration of a fused hit into a displayable record.
- storage
- sqlite-vec backed storage for code chunks + their embeddings.
- tokens
- Code-aware tokenizer shared by the index-time and query-time paths.
- walker
- Gitignore-aware filesystem traversal for indexing.
Functions§
- sqlite_
version - Bundled SQLite version string (e.g.
"3.46.0").