spider_skills
Skills and automation tactics for spider rust projects.
Pre-built skill definitions for solving common web challenges and interacting with the spider.cloud API. Skills are markdown prompt fragments with trigger conditions that get dynamically injected into the LLM context when the page state matches.
Note: The Rust crate is optional — it provides a typed integration layer for the spider.rs ecosystem. The skill definitions in
skills/are standalone markdown files usable with any LLM-based automation system.
Skill Folders
skills/
automation/ 69 web challenge skills (CAPTCHAs, puzzles, forms, security, data extraction)
api/ 8 spider.cloud API reference skills (crawl, scrape, search, screenshot, etc.)
Automation Skills (skills/automation/)
Pre-built tactics for common web challenges encountered during crawling and browser automation. Each .md file contains YAML frontmatter (trigger conditions, priority) and prompt content for LLM-driven solving.
Categories:
| Category | Count | Examples |
|---|---|---|
| CAPTCHAs | 20 | reCAPTCHA v2/v3, hCaptcha, Turnstile, GeeTest, FunCaptcha, audio, math, puzzle piece |
| Interactive Puzzles | 19 | Image grids, tic-tac-toe, word search, sliding tiles, mazes, sudoku, crosswords, memory games |
| Access Barriers | 10 | Cookie consent, login walls, age verification, paywalls, popups, redirect chains, iframes |
| Form Automation | 8 | Multi-step forms, file uploads, OTP inputs, payment forms, address forms |
| Anti-Bot / Security | 6 | Bot detection, rate limiting, JS challenges, proof-of-work, fingerprinting, device verification |
| Data Extraction | 6 | Tables, product listings, contact info, pricing, search results, charts |
API Skills (skills/api/)
Reference skills for the spider.cloud API — endpoint documentation, parameters, and usage examples.
| Skill | Endpoint | Description |
|---|---|---|
crawl |
POST /crawl |
Multi-page website crawling |
scrape |
POST /scrape |
Single-page data extraction |
search |
POST /search |
SERP queries with optional content fetch |
links |
POST /links |
Link discovery and extraction |
screenshot |
POST /screenshot |
Visual page capture |
transform |
POST /transform |
HTML-to-markdown/text conversion |
unblocker |
POST /unblocker |
Anti-bot bypass (10-40 extra credits) |
ai |
POST /ai/* |
AI-powered crawl, scrape, search, browser (subscription required) |
Install (Rust)
[]
= "0.1"
Feature Flags
| Feature | Default | Description |
|---|---|---|
web_challenges |
Yes | 69 built-in web challenge skills |
fetch |
No | Load skills from remote URLs at runtime |
s3 |
No | Load skills from AWS S3 buckets |
# All features
= { = "0.1", = ["web_challenges", "fetch", "s3"] }
# Minimal (just core types, no built-in skills)
= { = "0.1", = false }
Usage
use web_challenges;
// Get a registry with all 69 built-in skills
let registry = registry;
// Or pick specific skill categories
let mut registry = new_registry;
add_image_grid;
add_text_captcha;
add_tic_tac_toe;
Matching Skills Against Page State
let registry = registry;
// Returns combined prompt context for matching skills
let context = registry.match_context;
// context now contains the login-wall and recaptcha-v2 skill prompts
Custom Skills from Markdown
let mut registry = new_registry;
registry.load_markdown;
Loading Skills from URLs
# async
Loading Skills from S3
# async
Architecture
┌───────────────────────────┐
│ spider_skills │ ← This crate: types + skill content
│ ┌──────────────────────┐ │
│ │ Skill, SkillTrigger, │ │ ← Core types (defined here)
│ │ SkillRegistry │ │
│ ├──────────────────────┤ │
│ │ web_challenges │ │ ← 69 built-in automation skills
│ │ fetch │ │ ← Optional: fetch from URLs
│ │ s3 │ │ ← Optional: load from S3
│ └──────────────────────┘ │
└────────────┬──────────────┘
│ used by
┌─────────┼─────────┐
▼ ▼ ▼
spider spider your
_agent _worker project
Authoring Skills
Each skill is a markdown file with YAML frontmatter:
name: my-challenge-solver
description: Solves a specific type of web challenge
triggers:
- - -
Step-by-step instructions for the LLM to follow when this
challenge type is detected...
Trigger types:
title_contains— case-insensitive match on page titleurl_contains— case-insensitive match on page URLhtml_contains— case-insensitive match on page HTML
Priority: Higher values are injected first. Use 1-3 for low priority, 4-5 for medium, 6+ for high.
License
MIT