spider_skills 0.1.6

Skills and automation tactics for spider rust projects
Documentation

spider_skills

Crates.io Documentation License: MIT

Skills and automation tactics for spider rust projects.

Pre-built skill definitions for solving common web challenges and interacting with the spider.cloud API. Skills are markdown prompt fragments with trigger conditions that get dynamically injected into the LLM context when the page state matches.

Note: The Rust crate is optional — it provides a typed integration layer for the spider.rs ecosystem. The skill definitions in skills/ are standalone markdown files usable with any LLM-based automation system.

Skill Folders

skills/
  automation/   69 web challenge skills (CAPTCHAs, puzzles, forms, security, data extraction)
  api/           8 spider.cloud API reference skills (crawl, scrape, search, screenshot, etc.)

Automation Skills (skills/automation/)

Pre-built tactics for common web challenges encountered during crawling and browser automation. Each .md file contains YAML frontmatter (trigger conditions, priority) and prompt content for LLM-driven solving.

Categories:

Category Count Examples
CAPTCHAs 20 reCAPTCHA v2/v3, hCaptcha, Turnstile, GeeTest, FunCaptcha, audio, math, puzzle piece
Interactive Puzzles 19 Image grids, tic-tac-toe, word search, sliding tiles, mazes, sudoku, crosswords, memory games
Access Barriers 10 Cookie consent, login walls, age verification, paywalls, popups, redirect chains, iframes
Form Automation 8 Multi-step forms, file uploads, OTP inputs, payment forms, address forms
Anti-Bot / Security 6 Bot detection, rate limiting, JS challenges, proof-of-work, fingerprinting, device verification
Data Extraction 6 Tables, product listings, contact info, pricing, search results, charts

API Skills (skills/api/)

Reference skills for the spider.cloud API — endpoint documentation, parameters, and usage examples.

Skill Endpoint Description
crawl POST /crawl Multi-page website crawling
scrape POST /scrape Single-page data extraction
search POST /search SERP queries with optional content fetch
links POST /links Link discovery and extraction
screenshot POST /screenshot Visual page capture
transform POST /transform HTML-to-markdown/text conversion
unblocker POST /unblocker Anti-bot bypass (10-40 extra credits)
ai POST /ai/* AI-powered crawl, scrape, search, browser (subscription required)

Install (Rust)

[dependencies]
spider_skills = "0.1"

Feature Flags

Feature Default Description
web_challenges Yes 69 built-in web challenge skills
fetch No Load skills from remote URLs at runtime
s3 No Load skills from AWS S3 buckets
# All features
spider_skills = { version = "0.1", features = ["web_challenges", "fetch", "s3"] }

# Minimal (just core types, no built-in skills)
spider_skills = { version = "0.1", default-features = false }

Usage

use spider_skills::web_challenges;

// Get a registry with all 69 built-in skills
let registry = web_challenges::registry();

// Or pick specific skill categories
let mut registry = spider_skills::new_registry();
web_challenges::add_image_grid(&mut registry);
web_challenges::add_text_captcha(&mut registry);
web_challenges::add_tic_tac_toe(&mut registry);

Matching Skills Against Page State

let registry = spider_skills::web_challenges::registry();

// Returns combined prompt context for matching skills
let context = registry.match_context(
    "https://example.com/login",  // url
    "Sign In",                     // title
    "<div class='g-recaptcha'>",   // html
);

// context now contains the login-wall and recaptcha-v2 skill prompts

Custom Skills from Markdown

let mut registry = spider_skills::new_registry();
registry.load_markdown(r#"---
name: my-skill
description: Custom challenge solver
triggers:
  - title_contains: "my challenge"
  - html_contains: "challenge-widget"
---

Strategy for solving my custom challenge...
"#);

Loading Skills from URLs

# async fn example() {
let mut registry = spider_skills::new_registry();
spider_skills::fetch::fetch_skill(&mut registry, "https://example.com/skills/my-skill.md").await.unwrap();
# }

Loading Skills from S3

# async fn example() -> Result<(), Box<dyn std::error::Error>> {
use spider_skills::s3::S3SkillSource;

let source = S3SkillSource::new("my-skills-bucket").await;
let mut registry = spider_skills::new_registry();
source.load_into(&mut registry, "skills/").await?;
# Ok(())
# }

Architecture

┌───────────────────────────┐
│     spider_skills         │  ← This crate: types + skill content
│  ┌──────────────────────┐ │
│  │ Skill, SkillTrigger, │ │  ← Core types (defined here)
│  │ SkillRegistry        │ │
│  ├──────────────────────┤ │
│  │ web_challenges       │ │  ← 69 built-in automation skills
│  │ fetch                │ │  ← Optional: fetch from URLs
│  │ s3                   │ │  ← Optional: load from S3
│  └──────────────────────┘ │
└────────────┬──────────────┘
             │ used by
   ┌─────────┼─────────┐
   ▼         ▼         ▼
spider    spider     your
_agent   _worker    project

Authoring Skills

Each skill is a markdown file with YAML frontmatter:

---
name: my-challenge-solver
description: Solves a specific type of web challenge
triggers:
  - title_contains: "challenge keyword"
  - html_contains: "challenge-css-class"
  - url_contains: "/challenge/"
priority: 5
---

# Strategy

Step-by-step instructions for the LLM to follow when this
challenge type is detected...

Trigger types:

  • title_contains — case-insensitive match on page title
  • url_contains — case-insensitive match on page URL
  • html_contains — case-insensitive match on page HTML

Priority: Higher values are injected first. Use 1-3 for low priority, 4-5 for medium, 6+ for high.

License

MIT