Sciotte — Sport Activity Scraper
Sport activity scraper with headless Chrome, TOML-configurable providers, and in-memory caching. Logs into sport platforms via a browser (no API credentials needed), scrapes training data from activity pages, and exposes it through four integration surfaces: Rust trait, REST API, MCP server, and CLI.
Table of Contents
- Install
- Quick Start
- How It Works
- Credential Login
- Vision Mode
- Fake Login Testing
- REST API Server
- MCP Server
- Library Usage
- Provider Configuration
- Activity Data Model
- Environment Variables
- Docker
- Architecture
- License
Install
Homebrew (macOS / Linux) — recommended
This installs two binaries:
dravr-sciotte-server— REST API + MCP server + CLI (start withdravr-sciotte-server serve)dravr-sciotte-mcp— standalone MCP server for editor integration
Once installed, login and scrape:
Docker
Cargo (library)
[]
= "0.4"
Quick Start
# Login (opens a browser — log in to your account, no API keys needed)
# List activities (fast, from the training page — paginated)
# List with full detail (navigates each activity page for HR, cadence, weather, device, etc.)
# Auto-login + fetch in one command
# Start REST + MCP server
# Start MCP stdio transport (for Claude integration)
How It Works
- Browser login — opens a visible Chrome window to the provider's login page. You log in normally. Session cookies are captured and encrypted at rest (AES-256-GCM).
- List page scraping — navigates to the training/activity list page in headless Chrome, extracts activity rows using CSS selectors defined in the provider TOML.
- Pagination — automatically clicks the "next page" button to load more than the initial 20 activities.
- Detail enrichment (opt-in via
--detail) — navigates into each activity page and extracts full metrics (HR, cadence, weather, device, gear) using a JS snippet from the provider TOML, including structured data from embedded JSON. - Caching — results are cached in-memory with configurable TTL (default 15 min).
Credential Login
In addition to the interactive browser login, the library supports fully programmatic login with email and password. This flow runs headless Chrome, fills the login form automatically, and handles multi-factor authentication without any user interaction with a browser window.
Login Methods
Three login flows are supported, selected via the method parameter:
| Method | Description |
|---|---|
email |
Fill the provider's native email/password form directly (default) |
google |
Click the Google OAuth button, then fill Google's email/password form |
apple |
Click the Apple OAuth button, then fill Apple's sign-in form |
2FA Handling
The credential_login call returns one of five outcomes:
| Status | Meaning | Next Step |
|---|---|---|
authenticated |
Login succeeded, session is ready | Use the returned session_id |
otp_required |
Provider requires a one-time password or 2FA code | Call POST /auth/submit-otp |
two_factor_choice |
Provider shows multiple 2FA options | Call POST /auth/select-2fa with an option_id |
number_match |
Google shows a number to tap on your phone | Display the number, then call POST /auth/select-2fa with option_id: "poll" |
failed |
Wrong credentials or account locked | Check the reason field |
The browser session is kept alive between credential_login and submit_otp / select_two_factor calls, so the 2FA page remains open until you submit the code or select a method.
NumberMatch — Google Number Matching Challenge
Google sometimes shows a number matching challenge during 2FA: a number is displayed on screen and the user must tap the matching number on their phone. When this page is detected, the scraper returns LoginResult::NumberMatch(number) with the number to display to the user.
After showing the number to the user, call select_two_factor("poll"). This polls the browser until Google's page auto-redirects to the dashboard (the user tapped the correct number on their phone), then returns Success.
TwoFactorChoice → select_two_factor("app") → NumberMatch("78") → select_two_factor("poll") → Success
Full 2FA flow example with NumberMatch:
# Step 1 — attempt login
# Response if 2FA method selection is needed:
# {"status":"two_factor_choice","options":[{"id":"otp","label":"Google Authenticator"},{"id":"app","label":"Tap Yes on your phone"}]}
# Step 2 — select the app (phone) method
# Response when a number match challenge is shown:
# {"status":"number_match","number":"78"}
# Step 3 — tell your user to tap "78" on their phone, then poll for approval
# Response on success:
# {"status":"authenticated","session_id":"...","cookie_count":12}
Standard 2FA flow (authenticator app code):
# Step 1 — attempt login
# Step 2a — select a 2FA method (triggers the provider to send a code or prompt app approval)
# Response if a code is now required:
# {"status":"otp_required"}
# Step 2b — submit the OTP code
# Response on success:
# {"status":"authenticated","session_id":"...","cookie_count":12}
For the app method (phone tap without number matching), select_two_factor polls for up to DRAVR_SCIOTTE_PHONE_TAP_TIMEOUT seconds (default 60 s) waiting for the user to approve on their phone, then returns authenticated directly without a code.
WebSocket Browser Streaming
For cases where credential login is not suitable (CAPTCHA, unsupported provider, or user preference), the server also exposes a WebSocket endpoint that streams live Chrome screenshots to the client. The client can interact with the browser remotely — click, type, scroll — and the server detects login completion automatically.
GET /browser/login?method=direct
GET /browser/login?method=google
GET /browser/login?method=apple
If DRAVR_SCIOTTE_API_KEY is set, pass it as a query parameter since the browser WebSocket API cannot send custom headers:
GET /browser/login?token=my-secret
The WebSocket sends:
- Binary frames — JPEG screenshots of the Chrome viewport (1280×1024, ~12 fps)
- JSON text frames with
{"type":"status","state":"...","message":"..."}during setup - JSON text frame with
{"type":"login_success","session_id":"...","cookie_count":N}on completion - JSON text frame with
{"type":"login_failed","reason":"timeout"}if the 120 s deadline is exceeded
The client sends JSON text frames to dispatch input:
Vision Mode
By default, credential login uses CSS selectors and URL patterns to navigate login forms. Vision mode is an alternative that uses LLM screenshot analysis (via embacle's LlmProvider) to understand the page visually instead of relying on specific DOM elements.
Vision mode is designed for login flows only. Scraping (list pages and detail pages) always uses CSS selectors.
Why Vision Mode
Google's sign-in flow changes frequently. When Google updates their UI, CSS selectors break and need updating. Vision mode handles these changes automatically by analyzing screenshots rather than depending on element attributes.
Login Mode Options
Set DRAVR_SCIOTTE_LOGIN_MODE to select the strategy:
| Value | Description |
|---|---|
selector |
CSS selectors and URL patterns (default — fast, free) |
vision |
LLM screenshot analysis via embacle (resilient, costs per login) |
hybrid |
Try selectors first, fall back to vision on failure |
Enabling Vision Mode
Vision mode requires the vision Cargo feature and an embacle LlmProvider (Copilot Headless):
[]
= { = "0.4", = ["vision"] }
use Arc;
use ;
use ScraperConfig;
use ProviderConfig;
// Build the scraper with an LLM provider attached
let config = ScraperConfig ;
let llm: = /* your embacle provider */;
let scraper = new
.with_llm;
// credential_login uses vision when login_mode == Vision or Hybrid
let result = scraper.credential_login.await?;
Or set the environment variable without changing code:
DRAVR_SCIOTTE_LOGIN_MODE=vision
The hybrid mode is the safest option for production: it runs the selector-based flow first and only invokes the LLM if a step fails, keeping costs low while maintaining resilience.
Fake Login Testing
When DRAVR_SCIOTTE_FAKE_LOGIN=true, the scraper replaces all provider login URLs with embedded static HTML fixtures served on a local port. No real network calls are made and no real Chrome session is established against an external service.
This is useful for integration testing and CI environments where real credentials are not available.
DRAVR_SCIOTTE_FAKE_LOGIN=true
Test Passwords
The fake fixtures recognize these passwords for the Strava and Garmin providers:
| Password | Behavior |
|---|---|
correct-password |
Login succeeds directly (no 2FA) |
2fa-password |
Google OAuth path — returns TwoFactorChoice, then NumberMatch("78") on app selection |
no-mfa-password |
Garmin path — login succeeds without triggering MFA |
| Any other value | Login fails (wrong password error or stays on login page) |
OTP Code
For the Garmin MFA fixture (OtpRequired flow), submit 123456 to succeed. Any other code returns an error. The OTP code 000000 is not special — it fails like any other wrong code.
NumberMatch Flow (Fake)
# 1. credential_login with "2fa-password" via google method → TwoFactorChoice
# 2. select_two_factor("app") → NumberMatch("78")
# 3. The fake page auto-redirects to dashboard after 3 s
# 4. select_two_factor("poll") → Success
REST API Server (dravr-sciotte-server)
A unified HTTP server with built-in MCP support that serves scraped activity data. Supports --transport stdio for MCP-only mode (editor integration).
Usage
# Start on localhost:3000
# Specify port and host
# MCP-only mode via stdio (for editor/client integration)
Endpoints
| Method | Path | Description |
|---|---|---|
POST |
/auth/login |
Trigger browser login |
GET |
/auth/status |
Check authentication (supports X-Session-Id header) |
GET |
/auth/sessions |
List all active session IDs |
DELETE |
/auth/sessions/{id} |
Remove a specific session |
POST |
/auth/login-with-credentials |
Programmatic login with email/password |
POST |
/auth/submit-otp |
Submit OTP/2FA code after otp_required |
POST |
/auth/select-2fa |
Select a 2FA method after two_factor_choice, or pass "poll" after number_match |
GET |
/browser/login |
WebSocket — stream Chrome frames for interactive login |
GET |
/api/activities?limit=20 |
List scraped activities |
GET |
/api/activities/{id} |
Single activity detail |
GET |
/health |
Health check with cache stats |
POST |
/mcp |
MCP Streamable HTTP (JSON-RPC 2.0) |
Activity endpoints accept an optional X-Session-Id header to target a specific session when multiple sessions are active. Without the header, the most recently created session is used.
MCP Streamable HTTP
The server also speaks MCP at POST /mcp, accepting JSON-RPC 2.0 requests. Any MCP-compatible client can connect over HTTP instead of stdio.
# MCP initialize handshake
# List available tools
Add Accept: text/event-stream to receive SSE-wrapped responses instead of plain JSON.
Authentication
Optional. Set DRAVR_SCIOTTE_API_KEY to require bearer token auth on all REST endpoints. When unset, all requests are allowed through (localhost development mode).
DRAVR_SCIOTTE_API_KEY=my-secret
The /browser/login WebSocket endpoint accepts the token as a ?token= query parameter instead of a header.
MCP Server (dravr-sciotte-mcp)
A library and standalone binary that exposes the activity scraper via the Model Context Protocol. Connect any MCP-compatible client (Claude Desktop, Claude Code, editors, custom agents) to scrape sport activities.
Usage
# Stdio transport (default — for editor/client integration)
# HTTP transport (for network-accessible deployments)
MCP Tools
| Tool | Description |
|---|---|
auth_status |
Check if the session is authenticated and valid |
browser_login |
Open a browser window for the user to log in (no API keys needed) |
get_activities |
Scrape activities from the training page |
get_activity |
Scrape detailed data for a single activity by ID |
cache_status |
Get cache hit/miss statistics and entry counts |
cache_clear |
Clear all cached activity data |
Client Configuration
Add to your MCP client config (e.g. Claude Desktop claude_desktop_config.json):
For Claude Code, add the same configuration to your MCP settings.
Library Usage (Rust Trait)
[]
= "0.4"
Browser login
use ;
use CacheConfig;
use ActivityParams;
async
Credential login with 2FA handling
use ;
use CacheConfig;
use LoginResult;
async
All scraping is driven by a TOML provider config. The ActivityScraper trait can be wrapped by platform crates (e.g. pierre-scraper) with error bridging, following the same pattern as embacle's LlmProvider.
Provider Configuration
Scraping rules are defined in TOML files under providers/. The default provider is Strava (providers/strava.toml), compiled into the binary. A Garmin Connect provider (providers/garmin.toml) is also included.
[]
= "strava"
= "https://www.strava.com/login"
= ["/dashboard", "/athlete", "/feed"]
= ["/login", "/session"]
# CSS selectors for the native email/password form (required for credential_login)
= '#email, input[name="email"]'
= '#password, input[name="password"]'
= 'button[type="submit"], #login-button'
# CSS selector for the login error message (used to detect wrong password)
= '.alert-error, .alert-danger, [class*="error-message"]'
# CSS selector for the OTP/2FA code input (required for submit_otp)
= 'input[name="code"], input[type="tel"], input[autocomplete="one-time-code"]'
# OAuth button selectors — keys match the `method` parameter in credential_login
[]
= 'text:Sign In With Google'
= 'text:Sign In With Apple'
[]
= "https://www.strava.com/athlete/training"
= "tr.training-activity-row"
= 'a[data-field-name="name"]'
= '/\/activities\/(\d+)/'
[]
= 'a[data-field-name="name"]'
= 'td[data-field-name="sport_type"]'
= "td.col-date"
= 'td[data-field-name="time"]'
= "td.col-dist"
= "td.col-elev"
= "td.col-suffer-score"
[]
= "https://www.strava.com/activities/{id}"
= '''
(function() { /* JS that extracts all activity data and returns JSON */ })()
'''
OAuth button selectors support a text: prefix for matching by button text content in addition to standard CSS selectors. For example, text:Sign In With Google clicks the first button, anchor, or role=button element whose text contains that string.
To add a new provider, create a TOML file with the same structure and load it via ProviderConfig::from_file().
Activity Data Model
Activities scraped from detail pages include:
| Category | Fields |
|---|---|
| Core | id, name, sport_type, start_date, duration_seconds |
| Distance | distance_meters, elevation_gain, pace, gap |
| Heart Rate | average_heart_rate, max_heart_rate |
| Power | average_power, max_power, normalized_power |
| Cadence | average_cadence |
| Speed | average_speed, max_speed |
| Training | suffer_score, calories, elapsed_time_seconds |
| Weather | temperature, feels_like, humidity, wind_speed, wind_direction, weather |
| Equipment | device_name, gear_name |
| Location | city, region, country |
| Other | perceived_exertion, sport_type_detail, workout_type |
Environment Variables
All variables are optional. Unset variables use the defaults shown below.
Server
| Variable | Default | Description |
|---|---|---|
DRAVR_SCIOTTE_API_KEY |
(unset) | Bearer token required on all REST endpoints. When unset, no authentication is enforced. |
CHROME_PATH |
(auto-detected) | Path to a Chrome or Chromium binary. |
Login Behavior
| Variable | Default | Description |
|---|---|---|
DRAVR_SCIOTTE_LOGIN_MODE |
selector |
Login automation strategy: selector (CSS selectors), vision (LLM screenshot analysis via embacle), or hybrid (selectors with vision fallback). Vision requires the vision feature. |
DRAVR_SCIOTTE_CREDENTIAL_LOGIN_HEADLESS |
false |
Run credential login in headless Chrome. Defaults to false because Google and some providers block headless browsers. Set to true for CI environments using fake login. |
DRAVR_SCIOTTE_FAKE_LOGIN |
false |
Replace provider login URLs with embedded static HTML fixtures. Useful for testing without real credentials. |
Timing (credential login and scraping)
| Variable | Default | Description |
|---|---|---|
DRAVR_SCIOTTE_PAGE_TIMEOUT |
30 |
Page load timeout in seconds. |
DRAVR_SCIOTTE_INTERACTION_DELAY_MS |
500 |
Delay between page interactions in milliseconds. |
DRAVR_SCIOTTE_LOGIN_POLL_INTERVAL_MS |
500 |
Interval between URL polls during login detection in milliseconds. |
DRAVR_SCIOTTE_LOGIN_TIMEOUT |
120 |
Overall browser login timeout in seconds (interactive mode). |
DRAVR_SCIOTTE_PAGE_LOAD_WAIT |
3 |
Wait time after navigation for JS to render, in seconds. |
DRAVR_SCIOTTE_FORM_DELAY_MS |
300 |
Delay between form field interactions in milliseconds. |
DRAVR_SCIOTTE_EMAIL_STEP_TIMEOUT |
10 |
Timeout waiting for the password field to appear after email submit, in seconds. |
DRAVR_SCIOTTE_PASSWORD_STEP_TIMEOUT |
30 |
Timeout waiting for login result after password submit, in seconds. |
DRAVR_SCIOTTE_PHONE_TAP_TIMEOUT |
60 |
Timeout waiting for phone tap / app approval during 2FA, in seconds. |
Cache and Session
| Variable | Default | Description |
|---|---|---|
DRAVR_SCIOTTE_CACHE_TTL |
900 |
Activity cache TTL in seconds (15 minutes). |
DRAVR_SCIOTTE_CACHE_MAX |
100 |
Maximum number of cached activity entries. |
DRAVR_SCIOTTE_SESSION_DIR |
~/.config/dravr-sciotte |
Directory where encrypted session files are stored. |
Docker
Pull the image from GitHub Container Registry:
The image includes dravr-sciotte-server, dravr-sciotte-mcp, and Chromium for headless scraping.
# Start the REST + MCP server
# Mount session directory for persistent login
# Run the MCP server
Architecture
Your Application
└── dravr-sciotte (this library)
│
├── Provider Config (TOML-driven)
│ ├── providers/strava.toml → login URLs, CSS selectors, JS extraction
│ └── providers/garmin.toml → Garmin Connect variant
│
├── Chrome Scraper (chromiumoxide CDP)
│ ├── browser_login() → visible Chrome, user logs in, cookies captured
│ ├── credential_login() → headless Chrome, fills form, handles OAuth
│ ├── submit_otp() → submits OTP/2FA code on pending login page
│ ├── select_two_factor() → selects 2FA method, or polls after NumberMatch
│ ├── get_activities() → headless Chrome, list page + pagination
│ └── get_activity() → headless Chrome, detail page JS extraction
│
├── Vision Scraper (optional, requires `vision` feature)
│ └── VisionScraper → LLM screenshot analysis via embacle LlmProvider
│
├── Cache Layer (moka TTL cache)
│ └── CachedScraper → wraps ActivityScraper with in-memory TTL cache
│
├── Auth Persistence (AES-256-GCM)
│ └── ~/.config/dravr-sciotte/session.enc
│
├── Multi-Session Store
│ └── ServerState → HashMap<session_id, AuthSession>, X-Session-Id routing
│
├── MCP Server (library + binary crate, powered by dravr-tronc)
│ └── dravr-sciotte-mcp → JSON-RPC 2.0 over stdio or HTTP/SSE
│
└── Unified REST API + MCP + CLI (binary crate, powered by dravr-tronc)
└── dravr-sciotte-server → REST endpoints, MCP HTTP, WebSocket streaming, CLI
The core ActivityScraper trait:
browser_login()— open browser, capture sessioncredential_login()— programmatic login with email/password and OAuth supportsubmit_otp()— submit OTP/2FA code aftercredential_loginreturnedOtpRequiredselect_two_factor()— select a 2FA method, or pass"poll"to wait afterNumberMatchget_activities()— scrape activity list with paginationget_activity()— scrape single activity detailis_authenticated()— check session validity
For detailed API docs see docs.rs/dravr-sciotte.
License
Licensed under MIT OR Apache-2.0.