pub struct RemoteMultimodalConfigs {Show 19 fields
pub api_url: String,
pub api_key: Option<String>,
pub model_name: String,
pub system_prompt: Option<String>,
pub system_prompt_extra: Option<String>,
pub user_message_extra: Option<String>,
pub cfg: RemoteMultimodalConfig,
pub prompt_url_gate: Option<PromptUrlGate>,
pub concurrency_limit: Option<usize>,
pub vision_model: Option<ModelEndpoint>,
pub text_model: Option<ModelEndpoint>,
pub vision_route_mode: VisionRouteMode,
pub model_pool: Vec<ModelEndpoint>,
pub use_chrome_ai: bool,
pub chrome_ai_max_user_chars: usize,
pub semaphore: OnceLock<Arc<Semaphore>>,
pub relevance_credits: Arc<AtomicU32>,
pub url_prefilter_cache: Arc<DashMap<String, bool>>,
pub proxies: Option<Vec<String>>,
}Expand description
Top-level configuration bundle for remote multimodal automation.
This struct combines all the settings needed to drive the
RemoteMultimodalEngine:
- API connection (
api_url,api_key,model_name) - Prompt configuration (
system_prompt,system_prompt_extra,user_message_extra) - Runtime configuration (
RemoteMultimodalConfig) - URL gating (
PromptUrlGate) - Dual-model routing (
vision_model,text_model,vision_route_mode) - Chrome AI (
use_chrome_ai,chrome_ai_max_user_chars) - Skills (feature-gated
skill_registry,s3_skill_source) - Concurrency (
concurrency_limit, lazy semaphore) - Relevance tracking (
relevance_credits,url_prefilter_cache)
§Example
use spider_agent::automation::RemoteMultimodalConfigs;
let mm = RemoteMultimodalConfigs::new(
"https://openrouter.ai/api/v1/chat/completions",
"qwen/qwen-2-vl-72b-instruct",
)
.with_api_key("sk-or-...")
.with_concurrency_limit(5);Fields§
§api_url: StringOpenAI-compatible chat completions URL.
api_key: Option<String>Optional bearer key for Authorization: Bearer ...
model_name: StringModel name/id for the target endpoint.
system_prompt: Option<String>Optional base system prompt (None => engine default).
system_prompt_extra: Option<String>Optional extra system instructions appended at runtime.
user_message_extra: Option<String>Optional extra user instructions appended at runtime.
cfg: RemoteMultimodalConfigRuntime knobs (capture policies, retry, looping, etc.)
prompt_url_gate: Option<PromptUrlGate>Optional URL gating and per-URL overrides.
concurrency_limit: Option<usize>Optional concurrency limit for remote inference calls.
vision_model: Option<ModelEndpoint>Optional vision model endpoint for dual-model routing.
When set alongside text_model, the engine routes per-round
based on VisionRouteMode.
text_model: Option<ModelEndpoint>Optional text-only model endpoint for dual-model routing.
vision_route_mode: VisionRouteModeRouting mode controlling when vision vs text model is used.
model_pool: Vec<ModelEndpoint>Optional pool of model endpoints for per-round complexity routing.
When 3+ models are provided, the engine automatically routes simple rounds to cheap/fast models and complex rounds to powerful/expensive models — with zero extra LLM calls for the routing decision.
Pools with 0-2 models are ignored (existing single/dual routing applies).
use_chrome_ai: boolUse Chrome’s built-in LanguageModel API (Gemini Nano) for inference.
When true, the automation loop evaluates JavaScript on the page via
page.evaluate() calling LanguageModel.create() + session.prompt()
instead of making HTTP API calls. This enables running the agent
without any external API key.
When left false (default), Chrome AI is still used as a last-resort
fallback if both api_url and api_key are empty.
Requires Chrome with built-in AI enabled:
chrome://flags/#optimization-guide-on-device-model→ Enabledchrome://flags/#prompt-api-for-gemini-nano→ Enabled
chrome_ai_max_user_chars: usizeMaximum user-prompt characters for Chrome AI inference.
Gemini Nano has limited context compared to cloud models. This budget controls the max length of the user message (HTML context, URL, title, task instructions). When the user prompt exceeds this limit, the HTML context section is truncated while preserving task instructions and memory.
Default: 6000 chars. Only used when Chrome AI is the active inference path.
semaphore: OnceLock<Arc<Semaphore>>Semaphore control for concurrency limiting.
relevance_credits: Arc<AtomicU32>Counter for pages deemed irrelevant — each unit = one budget credit to restore.
url_prefilter_cache: Arc<DashMap<String, bool>>Cache of URL path → relevant classification to avoid re-classifying.
proxies: Option<Vec<String>>Optional HTTP proxy URLs for LLM API requests.
When set, the engine routes all outbound LLM HTTP calls through these
proxies (e.g. ["http://localhost:8080"]). Useful for debugging
request/response payloads with an intercepting proxy like mitmproxy.
Implementations§
Source§impl RemoteMultimodalConfigs
impl RemoteMultimodalConfigs
Sourcepub fn new(api_url: impl Into<String>, model_name: impl Into<String>) -> Self
pub fn new(api_url: impl Into<String>, model_name: impl Into<String>) -> Self
Create a new remote multimodal config bundle.
This sets the minimum required fields:
api_url: the OpenAI-compatible/v1/chat/completionsendpointmodel_name: the model identifier understood by that endpoint
All other fields fall back to Default::default.
§Example
use spider_agent::automation::RemoteMultimodalConfigs;
let mm = RemoteMultimodalConfigs::new(
"http://localhost:11434/v1/chat/completions",
"qwen2.5-vl",
);Sourcepub fn get_or_init_semaphore(&self) -> Option<Arc<Semaphore>>
pub fn get_or_init_semaphore(&self) -> Option<Arc<Semaphore>>
Get (and lazily init) the shared semaphore from concurrency_limit.
This is safe to call concurrently; OnceLock handles the race.
Sourcepub fn with_api_key(self, key: impl Into<String>) -> Self
pub fn with_api_key(self, key: impl Into<String>) -> Self
Attach an optional API key for authenticated endpoints.
When set, the engine will send:
Authorization: Bearer <api_key>
Sourcepub fn with_system_prompt(self, prompt: impl Into<String>) -> Self
pub fn with_system_prompt(self, prompt: impl Into<String>) -> Self
Set the base system prompt for the model.
Some(prompt)uses your prompt as the base system prompt.Nonemeans the engine should use its built-in default system prompt.
Sourcepub fn with_system_prompt_extra(self, extra: impl Into<String>) -> Self
pub fn with_system_prompt_extra(self, extra: impl Into<String>) -> Self
Append additional system-level instructions.
This is appended after the base system prompt and before any runtime config summary the engine might embed.
Sourcepub fn with_user_message_extra(self, extra: impl Into<String>) -> Self
pub fn with_user_message_extra(self, extra: impl Into<String>) -> Self
Append additional user instructions for the task.
This is appended to the user message after the captured page context.
Sourcepub fn with_cfg(self, cfg: RemoteMultimodalConfig) -> Self
pub fn with_cfg(self, cfg: RemoteMultimodalConfig) -> Self
Replace the runtime automation configuration.
Sourcepub fn with_prompt_url_gate(self, gate: PromptUrlGate) -> Self
pub fn with_prompt_url_gate(self, gate: PromptUrlGate) -> Self
Set optional URL gating and per-URL overrides.
Sourcepub fn with_concurrency_limit(self, limit: usize) -> Self
pub fn with_concurrency_limit(self, limit: usize) -> Self
Set an optional concurrency limit for remote inference calls.
Sourcepub fn with_proxies(self, proxies: Option<Vec<String>>) -> Self
pub fn with_proxies(self, proxies: Option<Vec<String>>) -> Self
Set HTTP proxy URLs for LLM API requests.
All outbound LLM HTTP calls will be routed through these proxies. Useful for debugging with an intercepting proxy (e.g. mitmproxy).
§Example
use spider_agent::automation::RemoteMultimodalConfigs;
let mm = RemoteMultimodalConfigs::new(
"http://localhost:11434/v1/chat/completions",
"qwen2.5-vl",
)
.with_proxies(Some(vec!["http://localhost:8080".to_string()]));Sourcepub fn with_extra_ai_data(self, enabled: bool) -> Self
pub fn with_extra_ai_data(self, enabled: bool) -> Self
Enable extraction mode to return structured data from pages.
Sourcepub fn with_extraction_prompt(self, prompt: impl Into<String>) -> Self
pub fn with_extraction_prompt(self, prompt: impl Into<String>) -> Self
Set a custom extraction prompt.
Sourcepub fn with_screenshot(self, enabled: bool) -> Self
pub fn with_screenshot(self, enabled: bool) -> Self
Enable screenshot capture after automation completes.
Sourcepub fn with_extraction_schema(self, schema: ExtractionSchema) -> Self
pub fn with_extraction_schema(self, schema: ExtractionSchema) -> Self
Set a JSON schema for structured extraction output.
Sourcepub fn model_supports_vision(&self) -> bool
pub fn model_supports_vision(&self) -> bool
Check if the configured model supports vision/multimodal input.
Uses the supports_vision function to detect based on model name.
Sourcepub fn should_include_screenshot(&self) -> bool
pub fn should_include_screenshot(&self) -> bool
Determine whether to include screenshots in LLM requests.
This respects the include_screenshot config override:
Some(true): Always include screenshotsSome(false): Never include screenshotsNone: Auto-detect based on model name
Sourcepub fn filter_screenshot<'a>(
&self,
screenshot: Option<&'a str>,
) -> Option<&'a str>
pub fn filter_screenshot<'a>( &self, screenshot: Option<&'a str>, ) -> Option<&'a str>
Filter screenshot based on model capabilities.
Returns the screenshot if the model supports vision and screenshots
are enabled, otherwise returns None.
Sourcepub fn with_vision_model(self, endpoint: ModelEndpoint) -> Self
pub fn with_vision_model(self, endpoint: ModelEndpoint) -> Self
Set the vision model endpoint for dual-model routing.
Sourcepub fn with_text_model(self, endpoint: ModelEndpoint) -> Self
pub fn with_text_model(self, endpoint: ModelEndpoint) -> Self
Set the text model endpoint for dual-model routing.
Sourcepub fn with_vision_route_mode(self, mode: VisionRouteMode) -> Self
pub fn with_vision_route_mode(self, mode: VisionRouteMode) -> Self
Set the vision routing mode.
Sourcepub fn with_dual_models(
self,
vision: ModelEndpoint,
text: ModelEndpoint,
) -> Self
pub fn with_dual_models( self, vision: ModelEndpoint, text: ModelEndpoint, ) -> Self
Convenience: set both vision and text model endpoints at once.
Sourcepub fn with_model_pool(self, pool: Vec<ModelEndpoint>) -> Self
pub fn with_model_pool(self, pool: Vec<ModelEndpoint>) -> Self
Set a pool of model endpoints for per-round complexity routing.
When 3+ models are provided, the engine uses [auto_policy] to
assign models to cost tiers, then picks cheap models for simple
rounds and expensive models for complex rounds.
Sourcepub fn with_relevance_gate(self, prompt: Option<String>) -> Self
pub fn with_relevance_gate(self, prompt: Option<String>) -> Self
Enable relevance gating with optional custom criteria prompt.
Sourcepub fn with_url_prefilter(self, batch_size: Option<usize>) -> Self
pub fn with_url_prefilter(self, batch_size: Option<usize>) -> Self
Enable URL-level pre-filtering before HTTP fetch.
Requires relevance_gate to also be enabled.
Sourcepub fn with_chrome_ai(self, enabled: bool) -> Self
pub fn with_chrome_ai(self, enabled: bool) -> Self
Enable Chrome built-in AI (LanguageModel / Gemini Nano) for inference.
When enabled, the engine uses page.evaluate() to call Chrome’s
LanguageModel.create() + session.prompt() instead of HTTP API calls.
No API key is required.
Even when not explicitly enabled, Chrome AI is used as a last-resort
fallback if both api_url and api_key are empty.
Sourcepub fn with_chrome_ai_max_user_chars(self, chars: usize) -> Self
pub fn with_chrome_ai_max_user_chars(self, chars: usize) -> Self
Set the maximum user-prompt character budget for Chrome AI inference.
Sourcepub fn should_use_chrome_ai(&self) -> bool
pub fn should_use_chrome_ai(&self) -> bool
Whether Chrome AI should be used for inference in this configuration.
Returns true when explicitly enabled OR when no API endpoint is
configured (last-resort fallback).
Sourcepub fn with_automation_timeout_ms(self, ms: u64) -> Self
pub fn with_automation_timeout_ms(self, ms: u64) -> Self
Set the overall automation timeout in milliseconds.
Overrides the default page-request-based timeout for the multimodal automation loop. Useful for slow inference hardware.
Sourcepub fn with_api_url(self, url: impl Into<String>) -> Self
pub fn with_api_url(self, url: impl Into<String>) -> Self
Override the API URL after construction.
Sourcepub fn with_model_name(self, name: impl Into<String>) -> Self
pub fn with_model_name(self, name: impl Into<String>) -> Self
Override the model name after construction.
Sourcepub fn with_include_html(self, include: bool) -> Self
pub fn with_include_html(self, include: bool) -> Self
Set whether to include cleaned HTML in the model input.
Sourcepub fn with_html_max_bytes(self, bytes: usize) -> Self
pub fn with_html_max_bytes(self, bytes: usize) -> Self
Set the maximum number of bytes of cleaned HTML to include.
Sourcepub fn with_include_url(self, include: bool) -> Self
pub fn with_include_url(self, include: bool) -> Self
Set whether to include the current URL in the model input.
Sourcepub fn with_include_title(self, include: bool) -> Self
pub fn with_include_title(self, include: bool) -> Self
Set whether to include the document title in the model input.
Sourcepub fn with_include_screenshot(self, include: Option<bool>) -> Self
pub fn with_include_screenshot(self, include: Option<bool>) -> Self
Set whether to include screenshots in LLM requests.
Some(true): Always include screenshots.Some(false): Never include screenshots.None: Auto-detect based on model name.
Sourcepub fn with_temperature(self, temp: f32) -> Self
pub fn with_temperature(self, temp: f32) -> Self
Set the sampling temperature used by the model.
Sourcepub fn with_max_tokens(self, tokens: u16) -> Self
pub fn with_max_tokens(self, tokens: u16) -> Self
Set the maximum tokens the model is allowed to generate.
Sourcepub fn with_request_json_object(self, enabled: bool) -> Self
pub fn with_request_json_object(self, enabled: bool) -> Self
Set whether to request response_format: {"type":"json_object"}.
Sourcepub fn with_best_effort_json_extract(self, enabled: bool) -> Self
pub fn with_best_effort_json_extract(self, enabled: bool) -> Self
Enable best-effort JSON extraction (strip fences / extract {...}).
Sourcepub fn with_reasoning_effort(self, effort: ReasoningEffort) -> Self
pub fn with_reasoning_effort(self, effort: ReasoningEffort) -> Self
Set explicit reasoning effort for supported models.
Sourcepub fn with_thinking_budget(self, budget: u32) -> Self
pub fn with_thinking_budget(self, budget: u32) -> Self
Set the token budget for Anthropic extended thinking.
Sourcepub fn with_max_rounds(self, rounds: usize) -> Self
pub fn with_max_rounds(self, rounds: usize) -> Self
Set the maximum number of plan/execute/re-capture rounds.
Sourcepub fn with_retry(self, retry: RetryPolicy) -> Self
pub fn with_retry(self, retry: RetryPolicy) -> Self
Set the retry policy for model output parsing and execution failures.
Sourcepub fn with_capture_profile(self, profile: CaptureProfile) -> Self
pub fn with_capture_profile(self, profile: CaptureProfile) -> Self
Add a capture profile to the list.
Sourcepub fn with_model_policy(self, policy: ModelPolicy) -> Self
pub fn with_model_policy(self, policy: ModelPolicy) -> Self
Set the model selection policy.
Sourcepub fn with_post_plan_wait_ms(self, ms: u64) -> Self
pub fn with_post_plan_wait_ms(self, ms: u64) -> Self
Set the wait time (ms) after executing a plan before re-capturing state.
Sourcepub fn with_max_inflight_requests(self, max: usize) -> Self
pub fn with_max_inflight_requests(self, max: usize) -> Self
Set the maximum number of concurrent LLM HTTP requests.
Sourcepub fn with_tool_calling_mode(self, mode: ToolCallingMode) -> Self
pub fn with_tool_calling_mode(self, mode: ToolCallingMode) -> Self
Set the tool calling mode for structured action output.
Sourcepub fn with_html_diff_mode(self, mode: HtmlDiffMode) -> Self
pub fn with_html_diff_mode(self, mode: HtmlDiffMode) -> Self
Set the HTML diff mode for condensed page state.
Sourcepub fn with_planning_mode(self, config: PlanningModeConfig) -> Self
pub fn with_planning_mode(self, config: PlanningModeConfig) -> Self
Enable planning mode with the given configuration.
Sourcepub fn with_synthesis_config(self, config: SynthesisConfig) -> Self
pub fn with_synthesis_config(self, config: SynthesisConfig) -> Self
Enable multi-page synthesis with the given configuration.
Sourcepub fn with_confidence_strategy(self, strategy: ConfidenceRetryStrategy) -> Self
pub fn with_confidence_strategy(self, strategy: ConfidenceRetryStrategy) -> Self
Set the confidence-based retry strategy.
Sourcepub fn with_self_healing(self, config: SelfHealingConfig) -> Self
pub fn with_self_healing(self, config: SelfHealingConfig) -> Self
Enable self-healing with the given configuration.
Sourcepub fn with_concurrent_execution(self, enabled: bool) -> Self
pub fn with_concurrent_execution(self, enabled: bool) -> Self
Enable or disable concurrent execution of independent actions.
Sourcepub fn with_max_skills_per_round(self, max: usize) -> Self
pub fn with_max_skills_per_round(self, max: usize) -> Self
Set the maximum number of skills to inject per round.
Sourcepub fn with_max_skill_context_chars(self, max: usize) -> Self
pub fn with_max_skill_context_chars(self, max: usize) -> Self
Set the maximum characters for skill context injection per round.
Sourcepub fn automation_timeout(&self) -> Option<Duration>
pub fn automation_timeout(&self) -> Option<Duration>
Return the configured automation timeout as a Duration, if set.
Sourcepub fn has_dual_model_routing(&self) -> bool
pub fn has_dual_model_routing(&self) -> bool
Whether dual-model routing is active
(at least one of vision_model / text_model is configured).
Sourcepub fn resolve_model_for_round(
&self,
use_vision: bool,
) -> (&str, &str, Option<&str>)
pub fn resolve_model_for_round( &self, use_vision: bool, ) -> (&str, &str, Option<&str>)
Resolve the (api_url, model_name, api_key) triple for the current round.
use_vision == true→ prefervision_model, fall back to primary.use_vision == false→ prefertext_model, fall back to primary.
Fields left as None on the chosen ModelEndpoint inherit from
the parent (self.api_url / self.api_key).
Sourcepub fn should_use_vision_this_round(
&self,
round_idx: usize,
stagnated: bool,
action_stuck_rounds: usize,
force_vision: bool,
) -> bool
pub fn should_use_vision_this_round( &self, round_idx: usize, stagnated: bool, action_stuck_rounds: usize, force_vision: bool, ) -> bool
Decide whether to use vision this round, based on the configured
VisionRouteMode and current loop state.
force_vision is an explicit per-round override (e.g. from request_vision).
Trait Implementations§
Source§impl Clone for RemoteMultimodalConfigs
impl Clone for RemoteMultimodalConfigs
Source§fn clone(&self) -> RemoteMultimodalConfigs
fn clone(&self) -> RemoteMultimodalConfigs
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for RemoteMultimodalConfigs
impl Debug for RemoteMultimodalConfigs
Source§impl Default for RemoteMultimodalConfigs
impl Default for RemoteMultimodalConfigs
Source§impl<'de> Deserialize<'de> for RemoteMultimodalConfigswhere
RemoteMultimodalConfigs: Default,
impl<'de> Deserialize<'de> for RemoteMultimodalConfigswhere
RemoteMultimodalConfigs: Default,
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl PartialEq for RemoteMultimodalConfigs
impl PartialEq for RemoteMultimodalConfigs
Source§impl Serialize for RemoteMultimodalConfigs
impl Serialize for RemoteMultimodalConfigs
impl Eq for RemoteMultimodalConfigs
Auto Trait Implementations§
impl !Freeze for RemoteMultimodalConfigs
impl !RefUnwindSafe for RemoteMultimodalConfigs
impl Send for RemoteMultimodalConfigs
impl Sync for RemoteMultimodalConfigs
impl Unpin for RemoteMultimodalConfigs
impl UnsafeUnpin for RemoteMultimodalConfigs
impl !UnwindSafe for RemoteMultimodalConfigs
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.