spider_agent 2.51.207

A concurrent-safe multimodal agent for web automation and research.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
# Spider Agent

A concurrent-safe multimodal agent for web automation and research.

## Features

- **Concurrent-safe**: Designed to be wrapped in `Arc` for multi-task access
- **Feature-gated**: Only include dependencies you need
- **Multiple LLM providers**: OpenAI, OpenAI-compatible APIs
- **Multiple search providers**: Serper, Brave, Bing, Tavily
- **HTML extraction**: Clean and extract structured data from web pages
- **Research synthesis**: Combine search + extraction + LLM synthesis

### Advanced Automation Features

- **Tool Calling Schema**: OpenAI-compatible function calling for reliable action parsing
- **HTML Diff Mode**: 50-70% token reduction by sending only page changes after first round
- **Planning Mode**: Multi-step planning reduces LLM round-trips
- **Parallel Synthesis**: Analyze N pages in a single LLM call
- **Confidence Tracking**: Smarter retry decisions based on LLM confidence scores
- **Self-Healing Selectors**: Auto-repair failed selectors with LLM diagnosis
- **Schema Generation**: Auto-generate JSON schemas from example outputs
- **Concurrent Chains**: Execute independent actions in parallel with dependency graphs
- **Embedded Scripting**: LLM-callable pure-Rust Python (`rustpython-vm`) and JavaScript (`boa_engine`) interpreters via the `scripting` feature — dedicated thread pool off tokio's blocking pool, no mutexes, no deadlocks, sandboxed filesystem and opt-in HTTP

## Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
spider_agent = { version = "0.1", features = ["openai", "search_serper"] }
```

## Quick Start

```rust
use spider_agent::{Agent, AgentConfig};
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let agent = Arc::new(Agent::builder()
        .with_openai("sk-...", "gpt-4o-mini")
        .with_search_serper("serper-key")
        .build()?);

    // Search
    let results = agent.search("rust web frameworks").await?;
    println!("Found {} results", results.len());

    // Extract from first result
    let html = agent.fetch(&results.results[0].url).await?.html;
    let data = agent.extract(&html, "Extract framework name and features").await?;
    println!("{}", serde_json::to_string_pretty(&data)?);

    Ok(())
}
```

## Concurrent Execution

```rust
use spider_agent::Agent;
use std::sync::Arc;

let agent = Arc::new(Agent::builder()
    .with_openai("sk-...", "gpt-4o")
    .with_search_serper("serper-key")
    .with_max_concurrent_llm_calls(10)
    .build()?);

// Execute multiple searches concurrently
let queries = vec!["rust async", "rust web frameworks", "rust databases"];
let mut handles = Vec::new();

for query in queries {
    let agent = agent.clone();
    let query = query.to_string();
    handles.push(tokio::spawn(async move {
        agent.search(&query).await
    }));
}

// Collect results
for handle in handles {
    let result = handle.await??;
    println!("Found {} results", result.results.len());
}
```

## Research with Synthesis

```rust
use spider_agent::{Agent, ResearchOptions};

let agent = Agent::builder()
    .with_openai("sk-...", "gpt-4o")
    .with_search_serper("serper-key")
    .build()?;

let research = agent.research(
    "How do Tokio and async-std compare?",
    ResearchOptions::new()
        .with_max_pages(5)
        .with_synthesize(true),
).await?;

println!("Summary: {}", research.summary.unwrap());
```

## Feature Flags

| Feature | Description |
|---------|-------------|
| `openai` | OpenAI/OpenAI-compatible LLM provider |
| `chrome` | Browser automation via chromiumoxide |
| `webdriver` | Browser automation via thirtyfour |
| `search` | Base search functionality |
| `search_serper` | Serper.dev search provider |
| `search_brave` | Brave Search provider |
| `search_bing` | Bing Search provider |
| `search_tavily` | Tavily AI Search provider |
| `fs` | Tempfile-backed disk storage helpers |
| `skills` | Dynamic skill loading for web challenge solving |
| `memvid` | Long-term experience memory via semantic search |
| `scripting` | Embedded pure-Rust Python + JavaScript interpreters for LLM-callable `RunPython` / `RunJavaScript` actions |
| `full` | All features |

### Scripting

With `features = ["scripting"]`, the agent embeds pure-Rust [`rustpython-vm`] (with the frozen pure-Python stdlib — `json`, `re`, `urllib.parse`, etc.) and [`boa_engine`] JavaScript interpreters and exposes them as two LLM-callable actions: `RunPython` and `RunJavaScript`.

**Design constraints honored:**

- **No `spawn_blocking`** — workers run on dedicated `std::thread`s, fully isolated from tokio's blocking pool (no contention with reqwest / file I/O / DB drivers).
- **No mutexes in our code**`OutputBuffer` uses a thread-local `RefCell<Vec<u8>>`; JS per-call state lives in a thread-local; channels are lock-free (`flume` MPMC + `tokio::oneshot`).
- **No deadlocks** — async caller only awaits lock-free primitives (semaphore acquire, flume `send_async`, oneshot, timeout); worker `Handle::block_on` parks an OS thread on a futex without consuming a runtime worker.
- **No panics escape** — every script invocation is wrapped in `catch_unwind`; a bad script cannot tear down a worker thread. All `unwrap`/`expect` removed from the worker path.
- **Sandboxed filesystem**`cap-std::fs::Dir` rooted at a per-call tmpdir; path escapes (`..`, absolute paths, symlinks) are structurally impossible.

**Scripts get an `agent` object:**

- `agent.url`, `agent.title`, `agent.html`, `agent.memory`, `agent.tmpdir` — read-only context
- `agent.log(...)`, `print(...)`, `console.log(...)` — captured to `stdout`
- `agent.fetch(url, opts?)` — reuses the shared reqwest client (gated by `allow_network`)
- `agent.read_file(rel)` / `agent.write_file(rel, content)` — sandboxed I/O
- `agent.check_interrupted()` — cooperative cancel poll (timeouts flip an `AtomicBool` the script polls)

```rust,ignore
use spider_agent::scripting::{ScriptConfig, ScriptContext, ScriptEngine};
use std::time::Duration;

let engine = ScriptEngine::new(ScriptConfig {
    enabled: true,
    allow_network: true,
    default_timeout: Duration::from_secs(5),
    ..ScriptConfig::default()
});

let result = engine.run_python(
    r#"
import json, re
nums = [int(n) for n in re.findall(r"\d+", agent.html)]
print(json.dumps({"url": agent.url, "found": nums}))
"#.to_string(),
    ScriptContext {
        url: Some("https://shop.example.com".into()),
        html: Some("<p>price 1999, stock 42</p>".into()),
        ..ScriptContext::default()
    },
    None,
).await;
println!("{}", result.stdout);
```

When attached via `RemoteMultimodalEngine::with_script_engine(...)`, the LLM can emit `{"RunPython": {"code": "..."}}` or `{"RunJavaScript": {"code": "..."}}` actions and the chrome dispatcher routes them through the worker pool, surfacing stdout to the next round.

Run the end-to-end demo: `cargo run --example scripting --features scripting`.

[`rustpython-vm`]: https://crates.io/crates/rustpython-vm
[`boa_engine`]: https://crates.io/crates/boa_engine

## Examples

```bash
# Basic search
SERPER_API_KEY=xxx cargo run --example basic_search --features search_serper

# Extract data
OPENAI_API_KEY=xxx cargo run --example extract --features openai

# Research
OPENAI_API_KEY=xxx SERPER_API_KEY=xxx cargo run --example research --features "openai search_serper"

# Concurrent execution
OPENAI_API_KEY=xxx SERPER_API_KEY=xxx cargo run --example concurrent --features "openai search_serper"

# Embedded scripting (Python + JS, no API key required)
cargo run --example scripting --features scripting
```

## Verification

From the repository root:

```bash
cargo check --workspace
cargo test -p spider_agent
cargo test -p spider_agent --features "openai search_serper"
RUN_LIVE_TESTS=1 cargo test -p spider_agent --features "openai search_serper" --test live_env_smoke -- --nocapture
```

## API Reference

### Agent

The main struct for all agent operations:

- `search(query)` - Search the web
- `search_with_options(query, options)` - Search with custom options
- `fetch(url)` - Fetch a URL
- `extract(html, prompt)` - Extract data from HTML using LLM
- `extract_structured(html, schema)` - Extract data matching a JSON schema
- `research(topic, options)` - Research a topic with synthesis
- `prompt(messages)` - Send a prompt to the LLM
- `memory_get/set/clear()` - Session memory operations

### AgentBuilder

Configure and build agents:

```rust
Agent::builder()
    .with_config(config)
    .with_system_prompt("You are a helpful assistant")
    .with_max_concurrent_llm_calls(10)
    .with_openai(api_key, model)
    .with_spider_cloud("spider-cloud-api-key")
    .with_search_serper(api_key)
    .build()
```

### Spider Cloud Tool Inheritance

You can register Spider Cloud routes as custom tools directly from the builder.

```rust
use spider_agent::Agent;

let agent = Agent::builder()
    .with_spider_cloud("spider-cloud-api-key")
    .build()?;

// Available tools:
// - spider_cloud_crawl
// - spider_cloud_scrape
// - spider_cloud_search
// - spider_cloud_links
// - spider_cloud_transform
// - spider_cloud_unblocker
```

For full control (custom API URL, toggles, AI subscription gating), use `SpiderCloudToolConfig`:

```rust
use spider_agent::{Agent, SpiderCloudToolConfig};

let spider_cloud = SpiderCloudToolConfig::new("spider-cloud-api-key")
    .with_api_url("https://api.spider.cloud")
    .with_tool_name_prefix("spider_cloud")
    .with_enable_ai_routes(true); // Only enable if your plan includes /ai/* routes

let agent = Agent::builder()
    .with_spider_cloud_config(spider_cloud)
    .build()?;
```

AI routes are disabled by default because they require a paid subscription:
https://spider.cloud/ai/pricing

Prompt-driven route orchestration example:

```bash
SPIDER_CLOUD_API_KEY=your-key cargo run -p spider_agent --example spider_cloud_prompt_flows \
  -- "run all flows for https://books.toscrape.com/ including search scrape crawl links transform unblocker"
```

To include AI routes (`/ai/crawl`, `/ai/scrape`, `/ai/search`, `/ai/browser`, `/ai/links`), enable both:
- config/env gate: `SPIDER_CLOUD_ENABLE_AI_ROUTES=1`
- prompt intent: include text like `include ai routes`

End-to-end release example (single-prompt pipeline + report):

```bash
SPIDER_CLOUD_API_KEY=your-key cargo run -p spider_agent --example spider_cloud_end_to_end \
  -- "Find top travel books on https://books.toscrape.com and return structured product fields"
```

Additional real-world examples:

```bash
# E-commerce competitor intelligence
SPIDER_CLOUD_API_KEY=your-key cargo run -p spider_agent --example spider_cloud_ecommerce_competitor \
  -- "https://books.toscrape.com/" "travel books"

# Job market intelligence pipeline
SPIDER_CLOUD_API_KEY=your-key cargo run -p spider_agent --example spider_cloud_jobs_pipeline \
  -- "rust engineer remote" "https://remoteok.com/remote-rust-jobs"
```

Notes:
- For markdown/text/raw/commonmark/bytes, use route-level `return_format`; transform is optional.
- For binary assets (PDF/images/files), `return_format: "bytes"` preserves fidelity.
- Run transform only when you explicitly need post-processing (`SPIDER_CLOUD_INCLUDE_TRANSFORM=1` in the examples).

You can also point this at any compatible endpoint (not only `api.spider.cloud`) and
use your own naming convention:

```rust
let spider_cloud = SpiderCloudToolConfig::new("provider-key")
    .with_api_url("https://my-gateway.example.com/v1")
    .with_tool_name_prefix("web_api"); // tools become web_api_search, web_api_scrape, etc.
```

## Advanced Configuration

### RemoteMultimodalConfig

Configure automation features. Use preset configurations for optimal performance:

```rust
use spider_agent::RemoteMultimodalConfig;

// Fast mode: All performance-positive features enabled
// - Tool calling (Auto), HTML diff (Auto), Confidence retries, Concurrent execution
let config = RemoteMultimodalConfig::fast();

// Fast with planning: Adds multi-step planning and self-healing
// Best for complex multi-step automations
let config = RemoteMultimodalConfig::fast_with_planning();

// Manual configuration for fine-grained control:
use spider_agent::{
    ToolCallingMode, HtmlDiffMode, ReasoningEffort,
    PlanningModeConfig, SelfHealingConfig, ConfidenceRetryStrategy,
};

let config = RemoteMultimodalConfig::default()
    .with_tool_calling_mode(ToolCallingMode::Auto)
    .with_html_diff_mode(HtmlDiffMode::Auto)
    .with_reasoning_effort(Some(ReasoningEffort::Medium))
    .with_planning_mode(PlanningModeConfig::default())
    .with_self_healing(SelfHealingConfig::default())
    .with_confidence_strategy(ConfidenceRetryStrategy::default())
    .with_concurrent_execution(true);
```

`reasoning_effort` is optional and only sent when configured, so OpenAI-compatible providers that do not support reasoning controls remain unaffected.

### Concurrent Action Chains

Execute independent actions in parallel using dependency graphs:

```rust
use spider_agent::{DependentStep, DependencyGraph, ConcurrentChainConfig, execute_graph};

// Define steps with dependencies
let steps = vec![
    DependentStep::new("fetch_data", json!({"Navigate": "https://example.com"})),
    DependentStep::new("click_a", json!({"Click": "#btn-a"})).depends_on("fetch_data"),
    DependentStep::new("click_b", json!({"Click": "#btn-b"})).depends_on("fetch_data"),
    DependentStep::new("submit", json!({"Click": "#submit"}))
        .depends_on("click_a")
        .depends_on("click_b"),
];

// Create dependency graph
let mut graph = DependencyGraph::new(steps)?;

// Execute with parallel-safe actions running concurrently
let config = ConcurrentChainConfig::default();
let result = execute_graph(&mut graph, &config, |step| async move {
    // Your execution logic here
    StepResult::success()
}).await;
```

### Schema Generation

Auto-generate JSON schemas from examples:

```rust
use spider_agent::{generate_schema, SchemaGenerationRequest};

let request = SchemaGenerationRequest {
    examples: vec![
        json!({"name": "Product A", "price": 19.99}),
        json!({"name": "Product B", "price": 29.99}),
    ],
    description: Some("Product listing data".to_string()),
    strict: false,
    name: Some("products".to_string()),
};

let schema = generate_schema(&request);
// Use schema.to_extraction_schema() for structured extraction
```

### Performance Features

| Feature | Default | `fast()` | Impact |
|---------|---------|----------|--------|
| Tool Calling | `JsonObject` | `Auto` | ~30% reduction in parse errors |
| HTML Diff | `Disabled` | `Auto` | 50-70% token reduction |
| Planning Mode | `None` | `None` | Fewer LLM round-trips |
| Parallel Synthesis | `None` | `None` | N pages = 1 LLM call |
| Confidence | `None` | `Enabled` | Smarter retry decisions |
| Self-Healing | `None` | `None` | Higher success rate on failures |
| Concurrent Execution | `false` | `true` | Parallel action execution |

**Recommended**: Use `RemoteMultimodalConfig::fast()` for optimal performance.

All features are opt-in with zero overhead when disabled.

## License

MIT