Sciotte — Sport Activity Scraper
Sport activity scraper with headless Chrome, TOML-configurable providers, and in-memory caching. Logs into sport platforms via a browser (no API credentials needed), scrapes training data from activity pages, and exposes it through four integration surfaces: Rust trait, REST API, MCP server, and CLI.
Table of Contents
- Install
- Quick Start
- How It Works
- REST API Server
- MCP Server
- Library Usage
- Provider Configuration
- Activity Data Model
- Docker
- Architecture
- License
Install
Homebrew (macOS / Linux) — recommended
This installs two binaries:
dravr-sciotte-server— REST API + MCP server + CLI (start withdravr-sciotte-server serve)dravr-sciotte-mcp— standalone MCP server for editor integration
Once installed, login and scrape:
Docker
Cargo (library)
[]
= "0.1"
Quick Start
# Login (opens a browser — log in to your account, no API keys needed)
# List activities (fast, from the training page — paginated)
# List with full detail (navigates each activity page for HR, cadence, weather, device, etc.)
# Auto-login + fetch in one command
# Start REST + MCP server
# Start MCP stdio transport (for Claude integration)
How It Works
- Browser login — opens a visible Chrome window to the provider's login page. You log in normally. Session cookies are captured and encrypted at rest (AES-256-GCM).
- List page scraping — navigates to the training/activity list page in headless Chrome, extracts activity rows using CSS selectors defined in the provider TOML.
- Pagination — automatically clicks the "next page" button to load more than the initial 20 activities.
- Detail enrichment (opt-in via
--detail) — navigates into each activity page and extracts full metrics (HR, cadence, weather, device, gear) using a JS snippet from the provider TOML, including structured data from embedded JSON. - Caching — results are cached in-memory with configurable TTL (default 15 min).
REST API Server (dravr-sciotte-server)
A unified HTTP server with built-in MCP support that serves scraped activity data. Supports --transport stdio for MCP-only mode (editor integration).
Usage
# Start on localhost:3000
# Specify port and host
# MCP-only mode via stdio (for editor/client integration)
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/auth/login |
Trigger browser login |
GET |
/auth/status |
Check authentication |
GET |
/api/activities?limit=20 |
List scraped activities |
GET |
/api/activities/{id} |
Single activity detail |
GET |
/health |
Health check with cache stats |
POST |
/mcp |
MCP Streamable HTTP (JSON-RPC 2.0) |
MCP Streamable HTTP
The server also speaks MCP at POST /mcp, accepting JSON-RPC 2.0 requests. Any MCP-compatible client can connect over HTTP instead of stdio.
# MCP initialize handshake
# List available tools
Add Accept: text/event-stream to receive SSE-wrapped responses instead of plain JSON.
Authentication
Optional. Set DRAVR_SCIOTTE_API_KEY to require bearer token auth on all endpoints. When unset, all requests are allowed through (localhost development mode).
DRAVR_SCIOTTE_API_KEY=my-secret
MCP Server (dravr-sciotte-mcp)
A library and standalone binary that exposes the activity scraper via the Model Context Protocol. Connect any MCP-compatible client (Claude Desktop, Claude Code, editors, custom agents) to scrape sport activities.
Usage
# Stdio transport (default — for editor/client integration)
# HTTP transport (for network-accessible deployments)
MCP Tools
| Tool | Description |
|---|---|
auth_status |
Check if the session is authenticated and valid |
browser_login |
Open a browser window for the user to log in (no API keys needed) |
get_activities |
Scrape activities from the training page |
get_activity |
Scrape detailed data for a single activity by ID |
cache_status |
Get cache hit/miss statistics and entry counts |
cache_clear |
Clear all cached activity data |
Client Configuration
Add to your MCP client config (e.g. Claude Desktop claude_desktop_config.json):
For Claude Code, add the same configuration to your MCP settings.
Library Usage (Rust Trait)
[]
= "0.1"
use ;
use CacheConfig;
use ActivityParams;
async
All scraping is driven by a TOML provider config. The ActivityScraper trait can be wrapped by platform crates (e.g. pierre-scraper) with error bridging, following the same pattern as embacle's LlmProvider.
Provider Configuration
Scraping rules are defined in TOML files under providers/. The default provider is Strava (providers/strava.toml), compiled into the binary.
[]
= "strava"
= "https://www.strava.com/login"
= ["/dashboard", "/athlete", "/feed"]
= ["/login", "/session"]
[]
= "https://www.strava.com/athlete/training"
= "tr.training-activity-row"
= 'a[data-field-name="name"]'
= '/\/activities\/(\d+)/'
[]
= 'a[data-field-name="name"]'
= 'td[data-field-name="sport_type"]'
= "td.col-date"
= 'td[data-field-name="time"]'
= "td.col-dist"
= "td.col-elev"
= "td.col-suffer-score"
[]
= "https://www.strava.com/activities/{id}"
= '''
(function() { /* JS that extracts all activity data and returns JSON */ })()
'''
To add a new provider, create a TOML file with the same structure and load it via ProviderConfig::from_file().
Activity Data Model
Activities scraped from detail pages include:
| Category | Fields |
|---|---|
| Core | id, name, sport_type, start_date, duration_seconds |
| Distance | distance_meters, elevation_gain, pace, gap |
| Heart Rate | average_heart_rate, max_heart_rate |
| Power | average_power, max_power, normalized_power |
| Cadence | average_cadence |
| Speed | average_speed, max_speed |
| Training | suffer_score, calories, elapsed_time_seconds |
| Weather | temperature, feels_like, humidity, wind_speed, wind_direction, weather |
| Equipment | device_name, gear_name |
| Location | city, region, country |
| Other | perceived_exertion, sport_type_detail, workout_type |
Docker
Pull the image from GitHub Container Registry:
The image includes dravr-sciotte-server, dravr-sciotte-mcp, and Chromium for headless scraping.
# Start the REST + MCP server
# Mount session directory for persistent login
# Run the MCP server
Architecture
Your Application
└── dravr-sciotte (this library)
│
├── Provider Config (TOML-driven)
│ └── providers/strava.toml → login URLs, CSS selectors, JS extraction
│
├── Chrome Scraper (chromiumoxide CDP)
│ ├── browser_login() → visible Chrome, user logs in, cookies captured
│ ├── get_activities() → headless Chrome, list page + pagination
│ └── get_activity() → headless Chrome, detail page JS extraction
│
├── Cache Layer (moka TTL cache)
│ └── CachedScraper → wraps ActivityScraper with in-memory TTL cache
│
├── Auth Persistence (AES-256-GCM)
│ └── ~/.config/dravr-sciotte/session.enc
│
├── MCP Server (library + binary crate)
│ └── dravr-sciotte-mcp → JSON-RPC 2.0 over stdio or HTTP/SSE
│
└── Unified REST API + MCP + CLI (binary crate)
└── dravr-sciotte-server → REST endpoints, MCP HTTP, CLI commands
The core ActivityScraper trait:
browser_login()— open browser, capture sessionget_activities()— scrape activity list with paginationget_activity()— scrape single activity detailis_authenticated()— check session validity
For detailed API docs see docs.rs/dravr-sciotte.
License
Licensed under MIT OR Apache-2.0.