stygian-plugin
A Chrome browser plugin fallback scraper for Stygian, providing flexible and interactive visual data extraction as a fallback when stygian-graph and stygian-browser cannot scrape a page.
Features
- Template-based extraction: Define a schema once, apply to multiple elements
- Recording-based: User clicks/highlights → generates extraction pattern
- Query-driven: CSS and XPath selectors with fallback support
- Region-based: Multiple independent zones, each with custom rules
- Multi-instance extraction: Iterate over matching elements on a page
- Transformation pipeline: Trim, normalize, regex, type coercion, HTML stripping, etc.
- Idempotent operations: ULID-based deduplication for safe retries
- Integrated with stygian-graph: Implements
ScrapingServicetrait for pipeline integration
Architecture
Following Stygian's hexagonal architecture:
- Domain (
src/domain/): Pure Rust, zero I/O dependencies - Ports (
src/ports.rs): Trait definitions (PluginTemplateStore, PluginExtractionPort, IdempotencyKeyStore) - Adapters (
src/adapters/): Concrete implementations- ExtractionEngine: CSS selector-based DOM extraction
- PluginExtractionAdapter: Bridges to stygian-graph's ScrapingService
- Storage (
src/storage/): Persistence adapters- FileTemplateStore: JSON file-based template storage
- MemoryIdempotencyStore: In-memory result caching
Quick Start
Creating a Template
use ;
use json;
// Define a template
let template = new
.with_description
.with_region
.with_region;
Executing Extraction
use ExtractionEngine;
use ExtractionRequest;
let html = r#"<html><h2 class="product-title">Widget</h2><span class="product-price">$99.99</span></html>"#;
let request = new;
let result = execute?;
println!;
Using with stygian-graph
Register the adapter in your service registry:
use PluginExtractionAdapter;
use ;
use Arc;
let adapter = new;
registry.register.await?;
Then use in a pipeline:
[[]]
= "extract-products"
= "plugin"
= { = "uuid-of-template", = 30000 }
Selectors
CSS Selectors
css
XPath Selectors
xpath
Dual Selectors (Recommended)
dual
The engine tries CSS first (faster), then falls back to XPath if no matches.
Transformations
Transformations are applied in order:
Trim: Remove leading/trailing whitespaceLowercase/Uppercase: Case conversionRemoveWhitespace: Strip all whitespaceNormalizeWhitespace: Collapse multiple spaces to single spaceStripHtml: Remove HTML tagsDecodeHtml: Decode HTML entitiesRegex { pattern, replacement }: Regex find-and-replaceRegexExtract { pattern, group }: Extract specific capture groupCoerce { target_type }: Convert to "string", "number", "boolean", "date"Filter { pattern }: Only include if matches regexParseJson: Parse as JSON
Example:
new
.with_transformation
.with_transformation
.with_transformation
.with_transformation
Idempotency
Each extraction request can include an idempotency key:
let request = new
.with_idempotency_key;
If the same key is used again, the cached result is returned (safe for retries).
Storage
Templates
let store = new;
store.save.await?;
let retrieved = store.get.await?;
let all = store.list.await?;
store.delete.await?;
Idempotency
let store = new;
store.store_result.await?;
if let Some = store.get_result.await?
Testing
Run tests:
Run examples:
Next Steps
- Phase 3: MCP tool integration (plugin_apply_template, plugin_record_*, etc.)
- Phase 4: Chrome extension (TypeScript, content script, service worker, UI)
- Phase 5: CircuitBreaker fallback routing from stygian-graph
- Phase 6: Full integration tests, CI/CD, documentation
License
AGPL-3.0-only OR LicenseRef-Commercial