pub struct DistributedQueryEngine { /* private fields */ }Expand description
Distributed query engine implementing two-phase TF-IDF.
This engine executes queries across multiple shards by:
- First collecting term frequencies from all shards
- Computing global document frequencies
- Re-executing queries with the global DF for accurate scoring
- Merging and normalizing results across shards
Implementations§
Source§impl DistributedQueryEngine
impl DistributedQueryEngine
Sourcepub fn new(config: DistributedHybridConfig) -> Self
pub fn new(config: DistributedHybridConfig) -> Self
Create a new distributed query engine with the given configuration.
Sourcepub fn with_defaults() -> Self
pub fn with_defaults() -> Self
Create a query engine with default configuration.
Sourcepub fn config(&self) -> &DistributedHybridConfig
pub fn config(&self) -> &DistributedHybridConfig
Get the configuration.
Sourcepub fn get_local_term_frequencies(
&self,
shard: &ShardedColony,
terms: &[String],
) -> HashMap<String, u64>
pub fn get_local_term_frequencies( &self, shard: &ShardedColony, terms: &[String], ) -> HashMap<String, u64>
Phase 1: Get term frequencies from a shard.
Collects how many documents in this shard contain each query term. This is used to compute local document frequencies.
Sourcepub fn aggregate_global_df(
&self,
local_dfs: Vec<HashMap<String, u64>>,
) -> HashMap<String, u64>
pub fn aggregate_global_df( &self, local_dfs: Vec<HashMap<String, u64>>, ) -> HashMap<String, u64>
Phase 2: Aggregate global document frequencies.
Combines local document frequencies from all shards to compute the global DF for each term across the entire distributed graph.
Sourcepub fn execute_local_query(
&self,
shard: &ShardedColony,
request: &LocalQueryRequest,
) -> LocalQueryResult
pub fn execute_local_query( &self, shard: &ShardedColony, request: &LocalQueryRequest, ) -> LocalQueryResult
Phase 3: Execute local query with global DF.
Computes TF-IDF scores for nodes in a single shard using the global document frequencies for accurate IDF computation.
Sourcepub fn merge_results(&self, results: Vec<LocalQueryResult>) -> Vec<ScoredNode>
pub fn merge_results(&self, results: Vec<LocalQueryResult>) -> Vec<ScoredNode>
Phase 4: Merge results from all shards.
Combines results from multiple shards, normalizes scores across shards, sorts by score, and returns the top-k results.
Sourcepub fn distributed_query(
&self,
shards: &[&ShardedColony],
query_text: &str,
) -> Vec<ScoredNode>
pub fn distributed_query( &self, shards: &[&ShardedColony], query_text: &str, ) -> Vec<ScoredNode>
Execute a full distributed query across multiple shards.
This is the main entry point for distributed queries. It coordinates all four phases of the two-phase TF-IDF algorithm:
- Collects local term frequencies from each shard
- Aggregates them into global document frequencies
- Executes local queries on each shard with global DF
- Merges and normalizes results
§Arguments
shards- Slice of shard references to queryquery_text- The raw query text to search for
§Returns
A vector of scored nodes, sorted by relevance (highest first).
Sourcepub fn local_query(
&self,
shard: &ShardedColony,
query_text: &str,
) -> Vec<ScoredNode>
pub fn local_query( &self, shard: &ShardedColony, query_text: &str, ) -> Vec<ScoredNode>
Execute a query on a single shard (for non-distributed use).
This is useful for testing or when the data resides in a single shard.