spider_skills

Skills and automation tactics for spider rust projects.

Pre-built skill definitions for solving common web challenges and interacting with the spider.cloud API. Skills are markdown prompt fragments with trigger conditions that get dynamically injected into the LLM context when the page state matches.

Note: The Rust crate is optional — it provides a typed integration layer for the spider.rs ecosystem. The skill definitions in skills/ are standalone markdown files usable with any LLM-based automation system.

Skill Folders

skills/
  automation/   69 web challenge skills (CAPTCHAs, puzzles, forms, security, data extraction)
  api/           8 spider.cloud API reference skills (crawl, scrape, search, screenshot, etc.)

Automation Skills (`skills/automation/`)

Pre-built tactics for common web challenges encountered during crawling and browser automation. Each .md file contains YAML frontmatter (trigger conditions, priority) and prompt content for LLM-driven solving.

Categories:

Category	Count	Examples
CAPTCHAs	20	reCAPTCHA v2/v3, hCaptcha, Turnstile, GeeTest, FunCaptcha, audio, math, puzzle piece
Interactive Puzzles	19	Image grids, tic-tac-toe, word search, sliding tiles, mazes, sudoku, crosswords, memory games
Access Barriers	10	Cookie consent, login walls, age verification, paywalls, popups, redirect chains, iframes
Form Automation	8	Multi-step forms, file uploads, OTP inputs, payment forms, address forms
Anti-Bot / Security	6	Bot detection, rate limiting, JS challenges, proof-of-work, fingerprinting, device verification
Data Extraction	6	Tables, product listings, contact info, pricing, search results, charts

API Skills (`skills/api/`)

Reference skills for the spider.cloud API — endpoint documentation, parameters, and usage examples.

Skill	Endpoint	Description
`crawl`	POST `/crawl`	Multi-page website crawling
`scrape`	POST `/scrape`	Single-page data extraction
`search`	POST `/search`	SERP queries with optional content fetch
`links`	POST `/links`	Link discovery and extraction
`screenshot`	POST `/screenshot`	Visual page capture
`transform`	POST `/transform`	HTML-to-markdown/text conversion
`unblocker`	POST `/unblocker`	Anti-bot bypass (10-40 extra credits)
`ai`	POST `/ai/*`	AI-powered crawl, scrape, search, browser (subscription required)

Install (Rust)

[dependencies]
spider_skills = "0.1"

Feature Flags

Feature	Default	Description
`web_challenges`	Yes	69 built-in web challenge skills
`fetch`	No	Load skills from remote URLs at runtime
`s3`	No	Load skills from AWS S3 buckets

# All features
spider_skills = { version = "0.1", features = ["web_challenges", "fetch", "s3"] }

# Minimal (just core types, no built-in skills)
spider_skills = { version = "0.1", default-features = false }

Usage

use spider_skills::web_challenges;

// Get a registry with all 69 built-in skills
let registry = web_challenges::registry();

// Or pick specific skill categories
let mut registry = spider_skills::new_registry();
web_challenges::add_image_grid(&mut registry);
web_challenges::add_text_captcha(&mut registry);
web_challenges::add_tic_tac_toe(&mut registry);

Matching Skills Against Page State

let registry = spider_skills::web_challenges::registry();

// Returns combined prompt context for matching skills
let context = registry.match_context(
    "https://example.com/login",  // url
    "Sign In",                     // title
    "<div class='g-recaptcha'>",   // html
);

// context now contains the login-wall and recaptcha-v2 skill prompts

Custom Skills from Markdown

let mut registry = spider_skills::new_registry();
registry.load_markdown(r#"---
name: my-skill
description: Custom challenge solver
triggers:
  - title_contains: "my challenge"
  - html_contains: "challenge-widget"
---

Strategy for solving my custom challenge...
"#);

Loading Skills from URLs

# async fn example() {
let mut registry = spider_skills::new_registry();
spider_skills::fetch::fetch_skill(&mut registry, "https://example.com/skills/my-skill.md").await.unwrap();
# }

Loading Skills from S3

# async fn example() -> Result<(), Box<dyn std::error::Error>> {
use spider_skills::s3::S3SkillSource;

let source = S3SkillSource::new("my-skills-bucket").await;
let mut registry = spider_skills::new_registry();
source.load_into(&mut registry, "skills/").await?;
# Ok(())
# }

Architecture

┌───────────────────────────┐
│     spider_skills         │  ← This crate: types + skill content
│  ┌──────────────────────┐ │
│  │ Skill, SkillTrigger, │ │  ← Core types (defined here)
│  │ SkillRegistry        │ │
│  ├──────────────────────┤ │
│  │ web_challenges       │ │  ← 69 built-in automation skills
│  │ fetch                │ │  ← Optional: fetch from URLs
│  │ s3                   │ │  ← Optional: load from S3
│  └──────────────────────┘ │
└────────────┬──────────────┘
             │ used by
   ┌─────────┼─────────┐
   ▼         ▼         ▼
spider    spider     your
_agent   _worker    project

Authoring Skills

Each skill is a markdown file with YAML frontmatter:

---
name: my-challenge-solver
description: Solves a specific type of web challenge
triggers:
  - title_contains: "challenge keyword"
  - html_contains: "challenge-css-class"
  - url_contains: "/challenge/"
priority: 5
---

# Strategy

Step-by-step instructions for the LLM to follow when this
challenge type is detected...

Trigger types:

title_contains — case-insensitive match on page title
url_contains — case-insensitive match on page URL
html_contains — case-insensitive match on page HTML

Priority: Higher values are injected first. Use 1-3 for low priority, 4-5 for medium, 6+ for high.

License

MIT

spider_skills 0.1.6