pub struct RemoteMultimodalConfigs {Show 17 fields
pub api_url: String,
pub api_key: Option<String>,
pub model_name: String,
pub system_prompt: Option<String>,
pub system_prompt_extra: Option<String>,
pub user_message_extra: Option<String>,
pub cfg: RemoteMultimodalConfig,
pub prompt_url_gate: Option<PromptUrlGate>,
pub concurrency_limit: Option<usize>,
pub vision_model: Option<ModelEndpoint>,
pub text_model: Option<ModelEndpoint>,
pub vision_route_mode: VisionRouteMode,
pub use_chrome_ai: bool,
pub chrome_ai_max_user_chars: usize,
pub semaphore: OnceLock<Arc<Semaphore>>,
pub relevance_credits: Arc<AtomicU32>,
pub url_prefilter_cache: Arc<DashMap<String, bool>>,
}Expand description
Top-level configuration bundle for remote multimodal automation.
This struct combines all the settings needed to drive the
RemoteMultimodalEngine:
- API connection (
api_url,api_key,model_name) - Prompt configuration (
system_prompt,system_prompt_extra,user_message_extra) - Runtime configuration (
RemoteMultimodalConfig) - URL gating (
PromptUrlGate) - Dual-model routing (
vision_model,text_model,vision_route_mode) - Chrome AI (
use_chrome_ai,chrome_ai_max_user_chars) - Skills (feature-gated
skill_registry,s3_skill_source) - Concurrency (
concurrency_limit, lazy semaphore) - Relevance tracking (
relevance_credits,url_prefilter_cache)
§Example
use spider_agent::automation::RemoteMultimodalConfigs;
let mm = RemoteMultimodalConfigs::new(
"https://openrouter.ai/api/v1/chat/completions",
"qwen/qwen-2-vl-72b-instruct",
)
.with_api_key("sk-or-...")
.with_concurrency_limit(5);Fields§
§api_url: StringOpenAI-compatible chat completions URL.
api_key: Option<String>Optional bearer key for Authorization: Bearer ...
model_name: StringModel name/id for the target endpoint.
system_prompt: Option<String>Optional base system prompt (None => engine default).
system_prompt_extra: Option<String>Optional extra system instructions appended at runtime.
user_message_extra: Option<String>Optional extra user instructions appended at runtime.
cfg: RemoteMultimodalConfigRuntime knobs (capture policies, retry, looping, etc.)
prompt_url_gate: Option<PromptUrlGate>Optional URL gating and per-URL overrides.
concurrency_limit: Option<usize>Optional concurrency limit for remote inference calls.
vision_model: Option<ModelEndpoint>Optional vision model endpoint for dual-model routing.
When set alongside text_model, the engine routes per-round
based on VisionRouteMode.
text_model: Option<ModelEndpoint>Optional text-only model endpoint for dual-model routing.
vision_route_mode: VisionRouteModeRouting mode controlling when vision vs text model is used.
use_chrome_ai: boolUse Chrome’s built-in LanguageModel API (Gemini Nano) for inference.
When true, the automation loop evaluates JavaScript on the page via
page.evaluate() calling LanguageModel.create() + session.prompt()
instead of making HTTP API calls. This enables running the agent
without any external API key.
When left false (default), Chrome AI is still used as a last-resort
fallback if both api_url and api_key are empty.
Requires Chrome with built-in AI enabled:
chrome://flags/#optimization-guide-on-device-model→ Enabledchrome://flags/#prompt-api-for-gemini-nano→ Enabled
chrome_ai_max_user_chars: usizeMaximum user-prompt characters for Chrome AI inference.
Gemini Nano has limited context compared to cloud models. This budget controls the max length of the user message (HTML context, URL, title, task instructions). When the user prompt exceeds this limit, the HTML context section is truncated while preserving task instructions and memory.
Default: 6000 chars. Only used when Chrome AI is the active inference path.
semaphore: OnceLock<Arc<Semaphore>>Semaphore control for concurrency limiting.
relevance_credits: Arc<AtomicU32>Counter for pages deemed irrelevant — each unit = one budget credit to restore.
url_prefilter_cache: Arc<DashMap<String, bool>>Cache of URL path → relevant classification to avoid re-classifying.
Implementations§
Source§impl RemoteMultimodalConfigs
impl RemoteMultimodalConfigs
Sourcepub fn new(api_url: impl Into<String>, model_name: impl Into<String>) -> Self
pub fn new(api_url: impl Into<String>, model_name: impl Into<String>) -> Self
Create a new remote multimodal config bundle.
This sets the minimum required fields:
api_url: the OpenAI-compatible/v1/chat/completionsendpointmodel_name: the model identifier understood by that endpoint
All other fields fall back to Default::default.
§Example
use spider_agent::automation::RemoteMultimodalConfigs;
let mm = RemoteMultimodalConfigs::new(
"http://localhost:11434/v1/chat/completions",
"qwen2.5-vl",
);Sourcepub fn get_or_init_semaphore(&self) -> Option<Arc<Semaphore>>
pub fn get_or_init_semaphore(&self) -> Option<Arc<Semaphore>>
Get (and lazily init) the shared semaphore from concurrency_limit.
This is safe to call concurrently; OnceLock handles the race.
Sourcepub fn with_api_key(self, key: impl Into<String>) -> Self
pub fn with_api_key(self, key: impl Into<String>) -> Self
Attach an optional API key for authenticated endpoints.
When set, the engine will send:
Authorization: Bearer <api_key>
Sourcepub fn with_system_prompt(self, prompt: impl Into<String>) -> Self
pub fn with_system_prompt(self, prompt: impl Into<String>) -> Self
Set the base system prompt for the model.
Some(prompt)uses your prompt as the base system prompt.Nonemeans the engine should use its built-in default system prompt.
Sourcepub fn with_system_prompt_extra(self, extra: impl Into<String>) -> Self
pub fn with_system_prompt_extra(self, extra: impl Into<String>) -> Self
Append additional system-level instructions.
This is appended after the base system prompt and before any runtime config summary the engine might embed.
Sourcepub fn with_user_message_extra(self, extra: impl Into<String>) -> Self
pub fn with_user_message_extra(self, extra: impl Into<String>) -> Self
Append additional user instructions for the task.
This is appended to the user message after the captured page context.
Sourcepub fn with_cfg(self, cfg: RemoteMultimodalConfig) -> Self
pub fn with_cfg(self, cfg: RemoteMultimodalConfig) -> Self
Replace the runtime automation configuration.
Sourcepub fn with_prompt_url_gate(self, gate: PromptUrlGate) -> Self
pub fn with_prompt_url_gate(self, gate: PromptUrlGate) -> Self
Set optional URL gating and per-URL overrides.
Sourcepub fn with_concurrency_limit(self, limit: usize) -> Self
pub fn with_concurrency_limit(self, limit: usize) -> Self
Set an optional concurrency limit for remote inference calls.
Sourcepub fn with_extra_ai_data(self, enabled: bool) -> Self
pub fn with_extra_ai_data(self, enabled: bool) -> Self
Enable extraction mode to return structured data from pages.
Sourcepub fn with_extraction_prompt(self, prompt: impl Into<String>) -> Self
pub fn with_extraction_prompt(self, prompt: impl Into<String>) -> Self
Set a custom extraction prompt.
Sourcepub fn with_screenshot(self, enabled: bool) -> Self
pub fn with_screenshot(self, enabled: bool) -> Self
Enable screenshot capture after automation completes.
Sourcepub fn with_extraction_schema(self, schema: ExtractionSchema) -> Self
pub fn with_extraction_schema(self, schema: ExtractionSchema) -> Self
Set a JSON schema for structured extraction output.
Sourcepub fn model_supports_vision(&self) -> bool
pub fn model_supports_vision(&self) -> bool
Check if the configured model supports vision/multimodal input.
Uses the supports_vision function to detect based on model name.
Sourcepub fn should_include_screenshot(&self) -> bool
pub fn should_include_screenshot(&self) -> bool
Determine whether to include screenshots in LLM requests.
This respects the include_screenshot config override:
Some(true): Always include screenshotsSome(false): Never include screenshotsNone: Auto-detect based on model name
Sourcepub fn filter_screenshot<'a>(
&self,
screenshot: Option<&'a str>,
) -> Option<&'a str>
pub fn filter_screenshot<'a>( &self, screenshot: Option<&'a str>, ) -> Option<&'a str>
Filter screenshot based on model capabilities.
Returns the screenshot if the model supports vision and screenshots
are enabled, otherwise returns None.
Sourcepub fn with_vision_model(self, endpoint: ModelEndpoint) -> Self
pub fn with_vision_model(self, endpoint: ModelEndpoint) -> Self
Set the vision model endpoint for dual-model routing.
Sourcepub fn with_text_model(self, endpoint: ModelEndpoint) -> Self
pub fn with_text_model(self, endpoint: ModelEndpoint) -> Self
Set the text model endpoint for dual-model routing.
Sourcepub fn with_vision_route_mode(self, mode: VisionRouteMode) -> Self
pub fn with_vision_route_mode(self, mode: VisionRouteMode) -> Self
Set the vision routing mode.
Sourcepub fn with_dual_models(
self,
vision: ModelEndpoint,
text: ModelEndpoint,
) -> Self
pub fn with_dual_models( self, vision: ModelEndpoint, text: ModelEndpoint, ) -> Self
Convenience: set both vision and text model endpoints at once.
Sourcepub fn with_relevance_gate(self, prompt: Option<String>) -> Self
pub fn with_relevance_gate(self, prompt: Option<String>) -> Self
Enable relevance gating with optional custom criteria prompt.
Sourcepub fn with_url_prefilter(self, batch_size: Option<usize>) -> Self
pub fn with_url_prefilter(self, batch_size: Option<usize>) -> Self
Enable URL-level pre-filtering before HTTP fetch.
Requires relevance_gate to also be enabled.
Sourcepub fn with_chrome_ai(self, enabled: bool) -> Self
pub fn with_chrome_ai(self, enabled: bool) -> Self
Enable Chrome built-in AI (LanguageModel / Gemini Nano) for inference.
When enabled, the engine uses page.evaluate() to call Chrome’s
LanguageModel.create() + session.prompt() instead of HTTP API calls.
No API key is required.
Even when not explicitly enabled, Chrome AI is used as a last-resort
fallback if both api_url and api_key are empty.
Sourcepub fn with_chrome_ai_max_user_chars(self, chars: usize) -> Self
pub fn with_chrome_ai_max_user_chars(self, chars: usize) -> Self
Set the maximum user-prompt character budget for Chrome AI inference.
Sourcepub fn should_use_chrome_ai(&self) -> bool
pub fn should_use_chrome_ai(&self) -> bool
Whether Chrome AI should be used for inference in this configuration.
Returns true when explicitly enabled OR when no API endpoint is
configured (last-resort fallback).
Sourcepub fn has_dual_model_routing(&self) -> bool
pub fn has_dual_model_routing(&self) -> bool
Whether dual-model routing is active
(at least one of vision_model / text_model is configured).
Sourcepub fn resolve_model_for_round(
&self,
use_vision: bool,
) -> (&str, &str, Option<&str>)
pub fn resolve_model_for_round( &self, use_vision: bool, ) -> (&str, &str, Option<&str>)
Resolve the (api_url, model_name, api_key) triple for the current round.
use_vision == true→ prefervision_model, fall back to primary.use_vision == false→ prefertext_model, fall back to primary.
Fields left as None on the chosen ModelEndpoint inherit from
the parent (self.api_url / self.api_key).
Sourcepub fn should_use_vision_this_round(
&self,
round_idx: usize,
stagnated: bool,
action_stuck_rounds: usize,
force_vision: bool,
) -> bool
pub fn should_use_vision_this_round( &self, round_idx: usize, stagnated: bool, action_stuck_rounds: usize, force_vision: bool, ) -> bool
Decide whether to use vision this round, based on the configured
VisionRouteMode and current loop state.
force_vision is an explicit per-round override (e.g. from request_vision).
Trait Implementations§
Source§impl Clone for RemoteMultimodalConfigs
impl Clone for RemoteMultimodalConfigs
Source§fn clone(&self) -> RemoteMultimodalConfigs
fn clone(&self) -> RemoteMultimodalConfigs
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for RemoteMultimodalConfigs
impl Debug for RemoteMultimodalConfigs
Source§impl Default for RemoteMultimodalConfigs
impl Default for RemoteMultimodalConfigs
Source§impl<'de> Deserialize<'de> for RemoteMultimodalConfigswhere
RemoteMultimodalConfigs: Default,
impl<'de> Deserialize<'de> for RemoteMultimodalConfigswhere
RemoteMultimodalConfigs: Default,
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl PartialEq for RemoteMultimodalConfigs
impl PartialEq for RemoteMultimodalConfigs
Source§impl Serialize for RemoteMultimodalConfigs
impl Serialize for RemoteMultimodalConfigs
impl Eq for RemoteMultimodalConfigs
Auto Trait Implementations§
impl !Freeze for RemoteMultimodalConfigs
impl !RefUnwindSafe for RemoteMultimodalConfigs
impl Send for RemoteMultimodalConfigs
impl Sync for RemoteMultimodalConfigs
impl Unpin for RemoteMultimodalConfigs
impl !UnwindSafe for RemoteMultimodalConfigs
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
Source§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.