Skip to main content

RemoteMultimodalConfigs

Struct RemoteMultimodalConfigs 

Source
pub struct RemoteMultimodalConfigs {
Show 19 fields pub api_url: String, pub api_key: Option<String>, pub model_name: String, pub system_prompt: Option<String>, pub system_prompt_extra: Option<String>, pub user_message_extra: Option<String>, pub cfg: RemoteMultimodalConfig, pub prompt_url_gate: Option<PromptUrlGate>, pub concurrency_limit: Option<usize>, pub vision_model: Option<ModelEndpoint>, pub text_model: Option<ModelEndpoint>, pub vision_route_mode: VisionRouteMode, pub model_pool: Vec<ModelEndpoint>, pub use_chrome_ai: bool, pub chrome_ai_max_user_chars: usize, pub semaphore: OnceLock<Arc<Semaphore>>, pub relevance_credits: Arc<AtomicU32>, pub url_prefilter_cache: Arc<DashMap<String, bool>>, pub proxies: Option<Vec<String>>,
}
Expand description

Top-level configuration bundle for remote multimodal automation.

This struct combines all the settings needed to drive the RemoteMultimodalEngine:

  • API connection (api_url, api_key, model_name)
  • Prompt configuration (system_prompt, system_prompt_extra, user_message_extra)
  • Runtime configuration (RemoteMultimodalConfig)
  • URL gating (PromptUrlGate)
  • Dual-model routing (vision_model, text_model, vision_route_mode)
  • Chrome AI (use_chrome_ai, chrome_ai_max_user_chars)
  • Skills (feature-gated skill_registry, s3_skill_source)
  • Concurrency (concurrency_limit, lazy semaphore)
  • Relevance tracking (relevance_credits, url_prefilter_cache)

§Example

use spider_agent::automation::RemoteMultimodalConfigs;

let mm = RemoteMultimodalConfigs::new(
    "https://openrouter.ai/api/v1/chat/completions",
    "qwen/qwen-2-vl-72b-instruct",
)
.with_api_key("sk-or-...")
.with_concurrency_limit(5);

Fields§

§api_url: String

OpenAI-compatible chat completions URL.

§api_key: Option<String>

Optional bearer key for Authorization: Bearer ...

§model_name: String

Model name/id for the target endpoint.

§system_prompt: Option<String>

Optional base system prompt (None => engine default).

§system_prompt_extra: Option<String>

Optional extra system instructions appended at runtime.

§user_message_extra: Option<String>

Optional extra user instructions appended at runtime.

§cfg: RemoteMultimodalConfig

Runtime knobs (capture policies, retry, looping, etc.)

§prompt_url_gate: Option<PromptUrlGate>

Optional URL gating and per-URL overrides.

§concurrency_limit: Option<usize>

Optional concurrency limit for remote inference calls.

§vision_model: Option<ModelEndpoint>

Optional vision model endpoint for dual-model routing. When set alongside text_model, the engine routes per-round based on VisionRouteMode.

§text_model: Option<ModelEndpoint>

Optional text-only model endpoint for dual-model routing.

§vision_route_mode: VisionRouteMode

Routing mode controlling when vision vs text model is used.

§model_pool: Vec<ModelEndpoint>

Optional pool of model endpoints for per-round complexity routing.

When 3+ models are provided, the engine automatically routes simple rounds to cheap/fast models and complex rounds to powerful/expensive models — with zero extra LLM calls for the routing decision.

Pools with 0-2 models are ignored (existing single/dual routing applies).

§use_chrome_ai: bool

Use Chrome’s built-in LanguageModel API (Gemini Nano) for inference.

When true, the automation loop evaluates JavaScript on the page via page.evaluate() calling LanguageModel.create() + session.prompt() instead of making HTTP API calls. This enables running the agent without any external API key.

When left false (default), Chrome AI is still used as a last-resort fallback if both api_url and api_key are empty.

Requires Chrome with built-in AI enabled:

  • chrome://flags/#optimization-guide-on-device-model → Enabled
  • chrome://flags/#prompt-api-for-gemini-nano → Enabled
§chrome_ai_max_user_chars: usize

Maximum user-prompt characters for Chrome AI inference.

Gemini Nano has limited context compared to cloud models. This budget controls the max length of the user message (HTML context, URL, title, task instructions). When the user prompt exceeds this limit, the HTML context section is truncated while preserving task instructions and memory.

Default: 6000 chars. Only used when Chrome AI is the active inference path.

§semaphore: OnceLock<Arc<Semaphore>>

Semaphore control for concurrency limiting.

§relevance_credits: Arc<AtomicU32>

Counter for pages deemed irrelevant — each unit = one budget credit to restore.

§url_prefilter_cache: Arc<DashMap<String, bool>>

Cache of URL path → relevant classification to avoid re-classifying.

§proxies: Option<Vec<String>>

Optional HTTP proxy URLs for LLM API requests.

When set, the engine routes all outbound LLM HTTP calls through these proxies (e.g. ["http://localhost:8080"]). Useful for debugging request/response payloads with an intercepting proxy like mitmproxy.

Implementations§

Source§

impl RemoteMultimodalConfigs

Source

pub fn new(api_url: impl Into<String>, model_name: impl Into<String>) -> Self

Create a new remote multimodal config bundle.

This sets the minimum required fields:

  • api_url: the OpenAI-compatible /v1/chat/completions endpoint
  • model_name: the model identifier understood by that endpoint

All other fields fall back to Default::default.

§Example
use spider_agent::automation::RemoteMultimodalConfigs;

let mm = RemoteMultimodalConfigs::new(
    "http://localhost:11434/v1/chat/completions",
    "qwen2.5-vl",
);
Source

pub fn get_or_init_semaphore(&self) -> Option<Arc<Semaphore>>

Get (and lazily init) the shared semaphore from concurrency_limit. This is safe to call concurrently; OnceLock handles the race.

Source

pub fn with_api_key(self, key: impl Into<String>) -> Self

Attach an optional API key for authenticated endpoints.

When set, the engine will send: Authorization: Bearer <api_key>

Source

pub fn with_system_prompt(self, prompt: impl Into<String>) -> Self

Set the base system prompt for the model.

  • Some(prompt) uses your prompt as the base system prompt.
  • None means the engine should use its built-in default system prompt.
Source

pub fn with_system_prompt_extra(self, extra: impl Into<String>) -> Self

Append additional system-level instructions.

This is appended after the base system prompt and before any runtime config summary the engine might embed.

Source

pub fn with_user_message_extra(self, extra: impl Into<String>) -> Self

Append additional user instructions for the task.

This is appended to the user message after the captured page context.

Source

pub fn with_cfg(self, cfg: RemoteMultimodalConfig) -> Self

Replace the runtime automation configuration.

Source

pub fn with_prompt_url_gate(self, gate: PromptUrlGate) -> Self

Set optional URL gating and per-URL overrides.

Source

pub fn with_concurrency_limit(self, limit: usize) -> Self

Set an optional concurrency limit for remote inference calls.

Source

pub fn with_proxies(self, proxies: Option<Vec<String>>) -> Self

Set HTTP proxy URLs for LLM API requests.

All outbound LLM HTTP calls will be routed through these proxies. Useful for debugging with an intercepting proxy (e.g. mitmproxy).

§Example
use spider_agent::automation::RemoteMultimodalConfigs;

let mm = RemoteMultimodalConfigs::new(
    "http://localhost:11434/v1/chat/completions",
    "qwen2.5-vl",
)
.with_proxies(Some(vec!["http://localhost:8080".to_string()]));
Source

pub fn with_extra_ai_data(self, enabled: bool) -> Self

Enable extraction mode to return structured data from pages.

Source

pub fn with_extraction_prompt(self, prompt: impl Into<String>) -> Self

Set a custom extraction prompt.

Source

pub fn with_screenshot(self, enabled: bool) -> Self

Enable screenshot capture after automation completes.

Source

pub fn with_extraction_schema(self, schema: ExtractionSchema) -> Self

Set a JSON schema for structured extraction output.

Source

pub fn model_supports_vision(&self) -> bool

Check if the configured model supports vision/multimodal input.

Uses the supports_vision function to detect based on model name.

Source

pub fn should_include_screenshot(&self) -> bool

Determine whether to include screenshots in LLM requests.

This respects the include_screenshot config override:

  • Some(true): Always include screenshots
  • Some(false): Never include screenshots
  • None: Auto-detect based on model name
Source

pub fn filter_screenshot<'a>( &self, screenshot: Option<&'a str>, ) -> Option<&'a str>

Filter screenshot based on model capabilities.

Returns the screenshot if the model supports vision and screenshots are enabled, otherwise returns None.

Source

pub fn with_vision_model(self, endpoint: ModelEndpoint) -> Self

Set the vision model endpoint for dual-model routing.

Source

pub fn with_text_model(self, endpoint: ModelEndpoint) -> Self

Set the text model endpoint for dual-model routing.

Source

pub fn with_vision_route_mode(self, mode: VisionRouteMode) -> Self

Set the vision routing mode.

Source

pub fn with_dual_models( self, vision: ModelEndpoint, text: ModelEndpoint, ) -> Self

Convenience: set both vision and text model endpoints at once.

Source

pub fn with_model_pool(self, pool: Vec<ModelEndpoint>) -> Self

Set a pool of model endpoints for per-round complexity routing.

When 3+ models are provided, the engine uses [auto_policy] to assign models to cost tiers, then picks cheap models for simple rounds and expensive models for complex rounds.

Source

pub fn with_relevance_gate(self, prompt: Option<String>) -> Self

Enable relevance gating with optional custom criteria prompt.

Source

pub fn with_url_prefilter(self, batch_size: Option<usize>) -> Self

Enable URL-level pre-filtering before HTTP fetch. Requires relevance_gate to also be enabled.

Source

pub fn with_chrome_ai(self, enabled: bool) -> Self

Enable Chrome built-in AI (LanguageModel / Gemini Nano) for inference.

When enabled, the engine uses page.evaluate() to call Chrome’s LanguageModel.create() + session.prompt() instead of HTTP API calls. No API key is required.

Even when not explicitly enabled, Chrome AI is used as a last-resort fallback if both api_url and api_key are empty.

Source

pub fn with_chrome_ai_max_user_chars(self, chars: usize) -> Self

Set the maximum user-prompt character budget for Chrome AI inference.

Source

pub fn should_use_chrome_ai(&self) -> bool

Whether Chrome AI should be used for inference in this configuration.

Returns true when explicitly enabled OR when no API endpoint is configured (last-resort fallback).

Source

pub fn with_automation_timeout_ms(self, ms: u64) -> Self

Set the overall automation timeout in milliseconds.

Overrides the default page-request-based timeout for the multimodal automation loop. Useful for slow inference hardware.

Source

pub fn with_api_url(self, url: impl Into<String>) -> Self

Override the API URL after construction.

Source

pub fn with_model_name(self, name: impl Into<String>) -> Self

Override the model name after construction.

Source

pub fn with_include_html(self, include: bool) -> Self

Set whether to include cleaned HTML in the model input.

Source

pub fn with_html_max_bytes(self, bytes: usize) -> Self

Set the maximum number of bytes of cleaned HTML to include.

Source

pub fn with_include_url(self, include: bool) -> Self

Set whether to include the current URL in the model input.

Source

pub fn with_include_title(self, include: bool) -> Self

Set whether to include the document title in the model input.

Source

pub fn with_include_screenshot(self, include: Option<bool>) -> Self

Set whether to include screenshots in LLM requests.

  • Some(true): Always include screenshots.
  • Some(false): Never include screenshots.
  • None: Auto-detect based on model name.
Source

pub fn with_temperature(self, temp: f32) -> Self

Set the sampling temperature used by the model.

Source

pub fn with_max_tokens(self, tokens: u16) -> Self

Set the maximum tokens the model is allowed to generate.

Source

pub fn with_request_json_object(self, enabled: bool) -> Self

Set whether to request response_format: {"type":"json_object"}.

Source

pub fn with_best_effort_json_extract(self, enabled: bool) -> Self

Enable best-effort JSON extraction (strip fences / extract {...}).

Source

pub fn with_reasoning_effort(self, effort: ReasoningEffort) -> Self

Set explicit reasoning effort for supported models.

Source

pub fn with_thinking_budget(self, budget: u32) -> Self

Set the token budget for Anthropic extended thinking.

Source

pub fn with_max_rounds(self, rounds: usize) -> Self

Set the maximum number of plan/execute/re-capture rounds.

Source

pub fn with_retry(self, retry: RetryPolicy) -> Self

Set the retry policy for model output parsing and execution failures.

Source

pub fn with_capture_profile(self, profile: CaptureProfile) -> Self

Add a capture profile to the list.

Source

pub fn with_model_policy(self, policy: ModelPolicy) -> Self

Set the model selection policy.

Source

pub fn with_post_plan_wait_ms(self, ms: u64) -> Self

Set the wait time (ms) after executing a plan before re-capturing state.

Source

pub fn with_max_inflight_requests(self, max: usize) -> Self

Set the maximum number of concurrent LLM HTTP requests.

Source

pub fn with_tool_calling_mode(self, mode: ToolCallingMode) -> Self

Set the tool calling mode for structured action output.

Source

pub fn with_html_diff_mode(self, mode: HtmlDiffMode) -> Self

Set the HTML diff mode for condensed page state.

Source

pub fn with_planning_mode(self, config: PlanningModeConfig) -> Self

Enable planning mode with the given configuration.

Source

pub fn with_synthesis_config(self, config: SynthesisConfig) -> Self

Enable multi-page synthesis with the given configuration.

Source

pub fn with_confidence_strategy(self, strategy: ConfidenceRetryStrategy) -> Self

Set the confidence-based retry strategy.

Source

pub fn with_self_healing(self, config: SelfHealingConfig) -> Self

Enable self-healing with the given configuration.

Source

pub fn with_concurrent_execution(self, enabled: bool) -> Self

Enable or disable concurrent execution of independent actions.

Source

pub fn with_max_skills_per_round(self, max: usize) -> Self

Set the maximum number of skills to inject per round.

Source

pub fn with_max_skill_context_chars(self, max: usize) -> Self

Set the maximum characters for skill context injection per round.

Source

pub fn automation_timeout(&self) -> Option<Duration>

Return the configured automation timeout as a Duration, if set.

Source

pub fn has_dual_model_routing(&self) -> bool

Whether dual-model routing is active (at least one of vision_model / text_model is configured).

Source

pub fn resolve_model_for_round( &self, use_vision: bool, ) -> (&str, &str, Option<&str>)

Resolve the (api_url, model_name, api_key) triple for the current round.

  • use_vision == true → prefer vision_model, fall back to primary.
  • use_vision == false → prefer text_model, fall back to primary.

Fields left as None on the chosen ModelEndpoint inherit from the parent (self.api_url / self.api_key).

Source

pub fn should_use_vision_this_round( &self, round_idx: usize, stagnated: bool, action_stuck_rounds: usize, force_vision: bool, ) -> bool

Decide whether to use vision this round, based on the configured VisionRouteMode and current loop state.

force_vision is an explicit per-round override (e.g. from request_vision).

Trait Implementations§

Source§

impl Clone for RemoteMultimodalConfigs

Source§

fn clone(&self) -> RemoteMultimodalConfigs

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for RemoteMultimodalConfigs

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for RemoteMultimodalConfigs

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for RemoteMultimodalConfigs

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for RemoteMultimodalConfigs

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for RemoteMultimodalConfigs

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl Eq for RemoteMultimodalConfigs

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,