pub struct AskConfig {Show 17 fields
pub model: String,
pub ollama_endpoint: String,
pub k_summary: u32,
pub k_raw: u32,
pub escalation_threshold: f64,
pub mmr_threshold: f64,
pub max_context_tokens: u32,
pub response_tokens: u32,
pub timeout_secs: u32,
pub min_score: f64,
pub continue_history_turns: u32,
pub rewriter_timeout_secs: u32,
pub compress_hits_enabled: bool,
pub summarize_hits_enabled: bool,
pub summarize_model: Option<String>,
pub backend: Option<BackendConfig>,
pub rewriter_backend: Option<BackendConfig>,
}Fields§
§model: String§ollama_endpoint: String§k_summary: u32§k_raw: u32§escalation_threshold: f64§mmr_threshold: f64§max_context_tokens: u32§response_tokens: u32§timeout_secs: u32§min_score: f64§continue_history_turns: u32§rewriter_timeout_secs: u32Separate, shorter timeout for the rewriter LLM call (Phase 3.3).
Rewriter output is small (~80 tokens) and falling back to the raw
question on failure is non-fatal, so we don’t want to burn the full
timeout_secs budget waiting on a slow/unreachable Ollama before
the user sees any response.
compress_hits_enabled: bool§summarize_hits_enabled: bool§summarize_model: Option<String>§backend: Option<BackendConfig>Per-stage backend override for the answer-generation model.
None = synthesize from legacy model + ollama_endpoint.
rewriter_backend: Option<BackendConfig>Per-stage backend override for the query rewriter.
None = synthesize an Ollama BackendConfig over the legacy model +
ollama_endpoint with rewriter_timeout_secs baked in.
Implementations§
Source§impl AskConfig
impl AskConfig
Sourcepub fn synthesize_backend(&self) -> BackendConfig
pub fn synthesize_backend(&self) -> BackendConfig
Returns the effective backend for the answer-generation model.
Per-stage backend override wins; otherwise synthesize from legacy
fields (model, ollama_endpoint) into an Ollama BackendConfig.
timeout_secs is baked from self.timeout_secs so the answer call
inherits the user’s per-call budget (rather than factory’s 120s
default). When the user supplied an explicit backend override
with its own timeout_secs, that wins — we only synthesize when
self.backend is None.
Sourcepub fn synthesize_rewriter_backend(&self) -> BackendConfig
pub fn synthesize_rewriter_backend(&self) -> BackendConfig
Returns the effective backend for the query rewriter.
When self.rewriter_backend is None, this synthesizes its OWN
Ollama BackendConfig with self.rewriter_timeout_secs baked in —
it does NOT fall through to synthesize_backend(). The rewriter
has a much tighter latency budget than the answer call (rewriter
output is small and falling back to the raw question on timeout
is non-fatal), so we don’t want a slow Ollama burning the full
timeout_secs budget before the user sees any response.