pub struct SearchConfig {Show 18 fields
pub enabled: bool,
pub searxng_url: Option<String>,
pub timeout_ms: u64,
pub default_limit: u32,
pub max_limit: u32,
pub research_engines: Vec<String>,
pub github_engines: Vec<String>,
pub rerank_enabled: bool,
pub query_expand: bool,
pub query_expand_variants: usize,
pub multi_round: bool,
pub passage_select: bool,
pub page2_fallback: bool,
pub answer_calibrated: bool,
pub answer_guarded: bool,
pub use_structured_sources: bool,
pub wikidata_lookup: bool,
pub snippet_fallback: bool,
}Expand description
Configuration for the /v1/search endpoint and its SearXNG backend.
When searxng_url is unset the endpoint returns HTTP 503 with
error_code: "search_disabled" — the route remains mounted so that
startup doesn’t have to know whether search will ever be configured.
Fields§
§enabled: boolMaster switch. Defaults to true; set to false to refuse all
/v1/search requests even if searxng_url is configured.
searxng_url: Option<String>Base URL of the SearXNG instance (e.g. http://searxng:8080).
None (the default) disables the endpoint with a clear error.
timeout_ms: u64End-to-end timeout for the SearXNG call in milliseconds.
default_limit: u32Default limit when the request omits it.
max_limit: u32Hard cap on limit per request. SaaS uses 20.
research_engines: Vec<String>SearXNG engines invoked when the request includes categories: ["research"].
Defaults match the SaaS implementation.
github_engines: Vec<String>SearXNG engines invoked when the request includes categories: ["github"].
rerank_enabled: boolRe-rank the flat result pool for the LLM answer / summarize path
(RRF + junk/coverage/geo filter + BM25 + domain dedupe) instead of the
raw SearXNG-score sort. Defaults to true. The plain (non-LLM) path is
unaffected and keeps SaaS byte-parity regardless of this flag.
query_expand: boolMulti-query expansion for the LLM answer / summarize path: before the
SearXNG fetch, generate an entity/keyword-focused rewrite of the query,
fetch both the original and the rewrite, and UNION the candidate pools
(recall can only increase — the original’s results are always kept).
Targets “retrieval-miss” failures where the answer’s source never
surfaced for the user’s phrasing. Costs one extra small LLM call + one
extra SearXNG fetch. Defaults to false (gated); the plain path and the
answer layer are untouched, so precision/SaaS-parity are preserved.
query_expand_variants: usizeNumber of LLM-generated query rewrites to fetch + union when
query_expand is on. 1 reproduces the original single-variant
behavior. Higher values request more DIVERSE reformulations
(abbreviation/acronym-expanded, keyword-focused) and fetch their pools
in parallel, raising recall on retrieval-miss queries (e.g. an
unexpanded acronym whose page never surfaced) at the cost of one extra
SearXNG fetch each. Clamped to MAX_QUERY_EXPAND_VARIANTS in the route.
multi_round: boolAdaptive multi-round retrieval (the “evidence-scout” loop). When the
round-1 answer ABSTAINS (sources lacked the fact), an LLM scout reads the
round-1 evidence and emits targeted follow-up queries (acronym-expanded,
exact-entity, predicate/date-specific); their results are scraped, unioned
into the pool, and the answer is re-synthesized ONCE. Bounded (one extra
round, capped follow-up queries) so worst-case stays within the request
deadline. Only fires on abstention, so ~most queries keep the single-shot
fast path. Recall-only + monotone-safe: a still-abstaining round-2 is
discarded, keeping round-1. Targets “the answer page never entered the
first pool” — the dominant remaining miss. Defaults to false (gated).
passage_select: boolPassage-level relevance gate for the LLM answer path: split each scraped
source into passages and feed the answer LLM only the query-relevant
ones (DeepSeek-scored, no new ML deps). Subtractive — removes noise, never
adds sources or forces commits; falls back to the full source on any
failure (byte-identical to off), so it is monotone-safe. Defaults to
false (gated); answer prompt + plain path untouched.
page2_fallback: boolPage-2 fallback for the LLM answer / summarize path: if the reranked
(junk-filtered, deduped) candidate pool comes back thinner than the
answer needs (< answer_top_n), fetch the SAME query’s SearXNG page 2
once and union it in, then re-rank. The trigger is evaluated POST-rerank,
so a junk-heavy first page does not suppress it; the extra fetch only
fires on already-under-yielding queries (QPS never doubles across the
corpus). Recall-only + abstention is untouched (a sparse page1+page2 pool
still abstains). Defaults to false (gated); requires rerank_enabled.
answer_calibrated: boolCalibrated answer path (gated): reduce recoverable OVER-abstentions by (a) feeding more sources to the answer LLM by default (top_n 5->8, so the answer in result #6-8 or behind a failed top-5 scrape still reaches it) and (b) swapping the answer prompt’s abstention rule for an anti-hedge variant — commit when the sources DO contain the answer (even indirectly / one inference step), abstain ONLY when they genuinely lack it. The “use ONLY sources” grounding is untouched, so this is the precise inverse of the cycle-1 blunt “always commit” failure (which forced commits on no-source cases). Default false; A/B with an INCORRECT-guard before flip.
answer_guarded: boolMoat-hardening abstention (gated). Appends a clause making the answer
model (a) REJECT a false/unverifiable premise instead of answering as
though it were true, (b) report when sources CONFLICT rather than picking
one confidently, and (c) abstain when not confident. Targets the
adversarial failure SealQA Seal-0 exposed: 32% confident-WRONG
(hallucination) on conflicting-source / false-premise questions, where
the “use ONLY sources” rule alone is insufficient. Complements (does not
replace) answer_calibrated. Default false; A/B requires Seal-0
hallucination DOWN with SimpleQA accuracy NOT regressed before flip.
use_structured_sources: boolUse SearXNG structured sources (gated, W0). SearXNG’s infoboxes[] /
answers[] arrays carry Wikidata/Wikipedia knowledge-panel facts
(entity attributes like religion/capital/director) that the results[]
transform path discards. With this on, those facts are parsed and pinned
as a high-trust source at the FRONT of the answer pool (still
UNTRUSTED-wrapped — widens evidence, never bypasses the safety wrapper).
Targets the obscure-entity recall gap (PopQA). Default false; A/B on
diag500 gold-in-sources with the wrong-non-abstain invariant before flip.
wikidata_lookup: boolDeterministic Wikidata entity-relation lookup (gated, W3). For
<relation> of <entity> questions (PopQA’s obscure long tail that web
search can’t surface), classify -> wbsearchentities -> property fetch and
pin the fact as a structured source (UNTRUSTED-wrapped, runs in parallel
with SearXNG, 3s-bounded, any error falls through). Free open data, no
AI, no SPARQL hot-path. Default false; A/B on diag500 PopQA accuracy +
the wrong-non-abstain invariant before flip.
snippet_fallback: boolSnippet fallback for the LLM answer path (gated): when a top-N result’s
scrape failed (empty markdown), the result is normally dropped from the
answer pool — if it was the answer-bearing page, crw abstains though
retrieval succeeded (diagnosed Pattern A). With this on, such results
fall back to their SearXNG description snippet as a thin source instead
of vanishing. The snippet is verbatim upstream text, so it cannot inject
a fact not already present — near-zero INCORRECT exposure. Default false.
Trait Implementations§
Source§impl Clone for SearchConfig
impl Clone for SearchConfig
Source§fn clone(&self) -> SearchConfig
fn clone(&self) -> SearchConfig
1.0.0 (const: unstable) · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read more