Skip to main content

RemoteMultimodalConfigs

Struct RemoteMultimodalConfigs 

Source
pub struct RemoteMultimodalConfigs {
Show 17 fields pub api_url: String, pub api_key: Option<String>, pub model_name: String, pub system_prompt: Option<String>, pub system_prompt_extra: Option<String>, pub user_message_extra: Option<String>, pub cfg: RemoteMultimodalConfig, pub prompt_url_gate: Option<PromptUrlGate>, pub concurrency_limit: Option<usize>, pub vision_model: Option<ModelEndpoint>, pub text_model: Option<ModelEndpoint>, pub vision_route_mode: VisionRouteMode, pub use_chrome_ai: bool, pub chrome_ai_max_user_chars: usize, pub semaphore: OnceLock<Arc<Semaphore>>, pub relevance_credits: Arc<AtomicU32>, pub url_prefilter_cache: Arc<DashMap<String, bool>>,
}
Expand description

Top-level configuration bundle for remote multimodal automation.

This struct combines all the settings needed to drive the RemoteMultimodalEngine:

  • API connection (api_url, api_key, model_name)
  • Prompt configuration (system_prompt, system_prompt_extra, user_message_extra)
  • Runtime configuration (RemoteMultimodalConfig)
  • URL gating (PromptUrlGate)
  • Dual-model routing (vision_model, text_model, vision_route_mode)
  • Chrome AI (use_chrome_ai, chrome_ai_max_user_chars)
  • Skills (feature-gated skill_registry, s3_skill_source)
  • Concurrency (concurrency_limit, lazy semaphore)
  • Relevance tracking (relevance_credits, url_prefilter_cache)

§Example

use spider_agent::automation::RemoteMultimodalConfigs;

let mm = RemoteMultimodalConfigs::new(
    "https://openrouter.ai/api/v1/chat/completions",
    "qwen/qwen-2-vl-72b-instruct",
)
.with_api_key("sk-or-...")
.with_concurrency_limit(5);

Fields§

§api_url: String

OpenAI-compatible chat completions URL.

§api_key: Option<String>

Optional bearer key for Authorization: Bearer ...

§model_name: String

Model name/id for the target endpoint.

§system_prompt: Option<String>

Optional base system prompt (None => engine default).

§system_prompt_extra: Option<String>

Optional extra system instructions appended at runtime.

§user_message_extra: Option<String>

Optional extra user instructions appended at runtime.

§cfg: RemoteMultimodalConfig

Runtime knobs (capture policies, retry, looping, etc.)

§prompt_url_gate: Option<PromptUrlGate>

Optional URL gating and per-URL overrides.

§concurrency_limit: Option<usize>

Optional concurrency limit for remote inference calls.

§vision_model: Option<ModelEndpoint>

Optional vision model endpoint for dual-model routing. When set alongside text_model, the engine routes per-round based on VisionRouteMode.

§text_model: Option<ModelEndpoint>

Optional text-only model endpoint for dual-model routing.

§vision_route_mode: VisionRouteMode

Routing mode controlling when vision vs text model is used.

§use_chrome_ai: bool

Use Chrome’s built-in LanguageModel API (Gemini Nano) for inference.

When true, the automation loop evaluates JavaScript on the page via page.evaluate() calling LanguageModel.create() + session.prompt() instead of making HTTP API calls. This enables running the agent without any external API key.

When left false (default), Chrome AI is still used as a last-resort fallback if both api_url and api_key are empty.

Requires Chrome with built-in AI enabled:

  • chrome://flags/#optimization-guide-on-device-model → Enabled
  • chrome://flags/#prompt-api-for-gemini-nano → Enabled
§chrome_ai_max_user_chars: usize

Maximum user-prompt characters for Chrome AI inference.

Gemini Nano has limited context compared to cloud models. This budget controls the max length of the user message (HTML context, URL, title, task instructions). When the user prompt exceeds this limit, the HTML context section is truncated while preserving task instructions and memory.

Default: 6000 chars. Only used when Chrome AI is the active inference path.

§semaphore: OnceLock<Arc<Semaphore>>

Semaphore control for concurrency limiting.

§relevance_credits: Arc<AtomicU32>

Counter for pages deemed irrelevant — each unit = one budget credit to restore.

§url_prefilter_cache: Arc<DashMap<String, bool>>

Cache of URL path → relevant classification to avoid re-classifying.

Implementations§

Source§

impl RemoteMultimodalConfigs

Source

pub fn new(api_url: impl Into<String>, model_name: impl Into<String>) -> Self

Create a new remote multimodal config bundle.

This sets the minimum required fields:

  • api_url: the OpenAI-compatible /v1/chat/completions endpoint
  • model_name: the model identifier understood by that endpoint

All other fields fall back to Default::default.

§Example
use spider_agent::automation::RemoteMultimodalConfigs;

let mm = RemoteMultimodalConfigs::new(
    "http://localhost:11434/v1/chat/completions",
    "qwen2.5-vl",
);
Source

pub fn get_or_init_semaphore(&self) -> Option<Arc<Semaphore>>

Get (and lazily init) the shared semaphore from concurrency_limit. This is safe to call concurrently; OnceLock handles the race.

Source

pub fn with_api_key(self, key: impl Into<String>) -> Self

Attach an optional API key for authenticated endpoints.

When set, the engine will send: Authorization: Bearer <api_key>

Source

pub fn with_system_prompt(self, prompt: impl Into<String>) -> Self

Set the base system prompt for the model.

  • Some(prompt) uses your prompt as the base system prompt.
  • None means the engine should use its built-in default system prompt.
Source

pub fn with_system_prompt_extra(self, extra: impl Into<String>) -> Self

Append additional system-level instructions.

This is appended after the base system prompt and before any runtime config summary the engine might embed.

Source

pub fn with_user_message_extra(self, extra: impl Into<String>) -> Self

Append additional user instructions for the task.

This is appended to the user message after the captured page context.

Source

pub fn with_cfg(self, cfg: RemoteMultimodalConfig) -> Self

Replace the runtime automation configuration.

Source

pub fn with_prompt_url_gate(self, gate: PromptUrlGate) -> Self

Set optional URL gating and per-URL overrides.

Source

pub fn with_concurrency_limit(self, limit: usize) -> Self

Set an optional concurrency limit for remote inference calls.

Source

pub fn with_extra_ai_data(self, enabled: bool) -> Self

Enable extraction mode to return structured data from pages.

Source

pub fn with_extraction_prompt(self, prompt: impl Into<String>) -> Self

Set a custom extraction prompt.

Source

pub fn with_screenshot(self, enabled: bool) -> Self

Enable screenshot capture after automation completes.

Source

pub fn with_extraction_schema(self, schema: ExtractionSchema) -> Self

Set a JSON schema for structured extraction output.

Source

pub fn model_supports_vision(&self) -> bool

Check if the configured model supports vision/multimodal input.

Uses the supports_vision function to detect based on model name.

Source

pub fn should_include_screenshot(&self) -> bool

Determine whether to include screenshots in LLM requests.

This respects the include_screenshot config override:

  • Some(true): Always include screenshots
  • Some(false): Never include screenshots
  • None: Auto-detect based on model name
Source

pub fn filter_screenshot<'a>( &self, screenshot: Option<&'a str>, ) -> Option<&'a str>

Filter screenshot based on model capabilities.

Returns the screenshot if the model supports vision and screenshots are enabled, otherwise returns None.

Source

pub fn with_vision_model(self, endpoint: ModelEndpoint) -> Self

Set the vision model endpoint for dual-model routing.

Source

pub fn with_text_model(self, endpoint: ModelEndpoint) -> Self

Set the text model endpoint for dual-model routing.

Source

pub fn with_vision_route_mode(self, mode: VisionRouteMode) -> Self

Set the vision routing mode.

Source

pub fn with_dual_models( self, vision: ModelEndpoint, text: ModelEndpoint, ) -> Self

Convenience: set both vision and text model endpoints at once.

Source

pub fn with_relevance_gate(self, prompt: Option<String>) -> Self

Enable relevance gating with optional custom criteria prompt.

Source

pub fn with_url_prefilter(self, batch_size: Option<usize>) -> Self

Enable URL-level pre-filtering before HTTP fetch. Requires relevance_gate to also be enabled.

Source

pub fn with_chrome_ai(self, enabled: bool) -> Self

Enable Chrome built-in AI (LanguageModel / Gemini Nano) for inference.

When enabled, the engine uses page.evaluate() to call Chrome’s LanguageModel.create() + session.prompt() instead of HTTP API calls. No API key is required.

Even when not explicitly enabled, Chrome AI is used as a last-resort fallback if both api_url and api_key are empty.

Source

pub fn with_chrome_ai_max_user_chars(self, chars: usize) -> Self

Set the maximum user-prompt character budget for Chrome AI inference.

Source

pub fn should_use_chrome_ai(&self) -> bool

Whether Chrome AI should be used for inference in this configuration.

Returns true when explicitly enabled OR when no API endpoint is configured (last-resort fallback).

Source

pub fn has_dual_model_routing(&self) -> bool

Whether dual-model routing is active (at least one of vision_model / text_model is configured).

Source

pub fn resolve_model_for_round( &self, use_vision: bool, ) -> (&str, &str, Option<&str>)

Resolve the (api_url, model_name, api_key) triple for the current round.

  • use_vision == true → prefer vision_model, fall back to primary.
  • use_vision == false → prefer text_model, fall back to primary.

Fields left as None on the chosen ModelEndpoint inherit from the parent (self.api_url / self.api_key).

Source

pub fn should_use_vision_this_round( &self, round_idx: usize, stagnated: bool, action_stuck_rounds: usize, force_vision: bool, ) -> bool

Decide whether to use vision this round, based on the configured VisionRouteMode and current loop state.

force_vision is an explicit per-round override (e.g. from request_vision).

Trait Implementations§

Source§

impl Clone for RemoteMultimodalConfigs

Source§

fn clone(&self) -> RemoteMultimodalConfigs

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for RemoteMultimodalConfigs

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for RemoteMultimodalConfigs

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for RemoteMultimodalConfigs

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for RemoteMultimodalConfigs

Source§

fn eq(&self, other: &Self) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for RemoteMultimodalConfigs

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl Eq for RemoteMultimodalConfigs

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,