# Sciotte — Sport Activity Scraper
[](https://crates.io/crates/dravr-sciotte)
[](https://docs.rs/dravr-sciotte)
[](https://github.com/dravr-ai/dravr-sciotte/actions/workflows/ci.yml)
[](LICENSE.md)
Sport activity scraper with headless Chrome, TOML-configurable providers, and in-memory caching. Logs into sport platforms via a browser (no API credentials needed), scrapes training data from activity pages, and exposes it through four integration surfaces: Rust trait, REST API, MCP server, and CLI.
## Table of Contents
- [Install](#install)
- [Quick Start](#quick-start)
- [How It Works](#how-it-works)
- [REST API Server](#rest-api-server-dravr-sciotte-server)
- [MCP Server](#mcp-server-dravr-sciotte-mcp)
- [Library Usage](#library-usage-rust-trait)
- [Provider Configuration](#provider-configuration)
- [Activity Data Model](#activity-data-model)
- [Docker](#docker)
- [Architecture](#architecture)
- [License](#license)
## Install
### Homebrew (macOS / Linux) — recommended
```bash
brew tap dravr-ai/tap
brew install dravr-sciotte
```
This installs two binaries:
- **`dravr-sciotte-server`** — REST API + MCP server + CLI (start with `dravr-sciotte-server serve`)
- **`dravr-sciotte-mcp`** — standalone MCP server for editor integration
Once installed, login and scrape:
```bash
dravr-sciotte-server login
dravr-sciotte-server activities --limit 20
```
### Docker
```bash
docker pull ghcr.io/dravr-ai/dravr-sciotte:latest
docker run -p 3000:3000 ghcr.io/dravr-ai/dravr-sciotte
```
### Cargo (library)
```toml
[dependencies]
dravr-sciotte = "0.1"
```
## Quick Start
```bash
# Login (opens a browser — log in to your account, no API keys needed)
dravr-sciotte-server login
# List activities (fast, from the training page — paginated)
dravr-sciotte-server activities --limit 50
# List with full detail (navigates each activity page for HR, cadence, weather, device, etc.)
dravr-sciotte-server activities --limit 5 --detail --format json
# Auto-login + fetch in one command
dravr-sciotte-server activities --login --limit 10
# Start REST + MCP server
dravr-sciotte-server serve --port 3000
# Start MCP stdio transport (for Claude integration)
dravr-sciotte-server --transport stdio
```
## How It Works
1. **Browser login** — opens a visible Chrome window to the provider's login page. You log in normally. Session cookies are captured and encrypted at rest (AES-256-GCM).
2. **List page scraping** — navigates to the training/activity list page in headless Chrome, extracts activity rows using CSS selectors defined in the provider TOML.
3. **Pagination** — automatically clicks the "next page" button to load more than the initial 20 activities.
4. **Detail enrichment** (opt-in via `--detail`) — navigates into each activity page and extracts full metrics (HR, cadence, weather, device, gear) using a JS snippet from the provider TOML, including structured data from embedded JSON.
5. **Caching** — results are cached in-memory with configurable TTL (default 15 min).
## REST API Server (`dravr-sciotte-server`)
A unified HTTP server with built-in MCP support that serves scraped activity data. Supports `--transport stdio` for MCP-only mode (editor integration).
### Usage
```bash
# Start on localhost:3000
dravr-sciotte-server serve
# Specify port and host
dravr-sciotte-server serve --port 8080 --host 0.0.0.0
# MCP-only mode via stdio (for editor/client integration)
dravr-sciotte-server --transport stdio
```
### Endpoints
| `POST` | `/auth/login` | Trigger browser login |
| `GET` | `/auth/status` | Check authentication |
| `GET` | `/api/activities?limit=20` | List scraped activities |
| `GET` | `/api/activities/{id}` | Single activity detail |
| `GET` | `/health` | Health check with cache stats |
| `POST` | `/mcp` | MCP Streamable HTTP (JSON-RPC 2.0) |
### MCP Streamable HTTP
The server also speaks [MCP](https://modelcontextprotocol.io/) at `POST /mcp`, accepting JSON-RPC 2.0 requests. Any MCP-compatible client can connect over HTTP instead of stdio.
```bash
# MCP initialize handshake
curl http://localhost:3000/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"curl"}}}'
# List available tools
curl http://localhost:3000/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
```
Add `Accept: text/event-stream` to receive SSE-wrapped responses instead of plain JSON.
### Authentication
Optional. Set `DRAVR_SCIOTTE_API_KEY` to require bearer token auth on all endpoints. When unset, all requests are allowed through (localhost development mode).
```bash
DRAVR_SCIOTTE_API_KEY=my-secret dravr-sciotte-server serve
curl http://localhost:3000/api/activities -H "Authorization: Bearer my-secret"
```
## MCP Server (`dravr-sciotte-mcp`)
A library and standalone binary that exposes the activity scraper via the [Model Context Protocol](https://modelcontextprotocol.io/). Connect any MCP-compatible client (Claude Desktop, Claude Code, editors, custom agents) to scrape sport activities.
### Usage
```bash
# Stdio transport (default — for editor/client integration)
dravr-sciotte-mcp
# HTTP transport (for network-accessible deployments)
dravr-sciotte-mcp --transport http --host 0.0.0.0 --port 3000
```
### MCP Tools
| `auth_status` | Check if the session is authenticated and valid |
| `browser_login` | Open a browser window for the user to log in (no API keys needed) |
| `get_activities` | Scrape activities from the training page |
| `get_activity` | Scrape detailed data for a single activity by ID |
| `cache_status` | Get cache hit/miss statistics and entry counts |
| `cache_clear` | Clear all cached activity data |
### Client Configuration
Add to your MCP client config (e.g. Claude Desktop `claude_desktop_config.json`):
```json
{
"mcpServers": {
"dravr-sciotte": {
"command": "dravr-sciotte-mcp"
}
}
}
```
For Claude Code, add the same configuration to your MCP settings.
## Library Usage (Rust Trait)
```toml
[dependencies]
dravr-sciotte = "0.1"
```
```rust
use dravr_sciotte::{ChromeScraper, CachedScraper, ActivityScraper};
use dravr_sciotte::config::CacheConfig;
use dravr_sciotte::models::ActivityParams;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let scraper = ChromeScraper::default_config();
let cached = CachedScraper::new(scraper, &CacheConfig::default());
// Browser login (opens Chrome, user logs in, cookies captured)
let session = cached.browser_login().await?;
// Scrape activities
let params = ActivityParams { limit: Some(20), ..Default::default() };
let activities = cached.get_activities(&session, ¶ms).await?;
for activity in &activities {
println!("{}: {} ({})", activity.id, activity.name, activity.sport_type);
}
Ok(())
}
```
All scraping is driven by a TOML provider config. The `ActivityScraper` trait can be wrapped by platform crates (e.g. `pierre-scraper`) with error bridging, following the same pattern as embacle's `LlmProvider`.
## Provider Configuration
Scraping rules are defined in TOML files under `providers/`. The default provider is Strava (`providers/strava.toml`), compiled into the binary.
```toml
[provider]
name = "strava"
login_url = "https://www.strava.com/login"
login_success_patterns = ["/dashboard", "/athlete", "/feed"]
login_failure_patterns = ["/login", "/session"]
[list_page]
url = "https://www.strava.com/athlete/training"
row_selector = "tr.training-activity-row"
link_selector = 'a[data-field-name="name"]'
id_regex = '/\/activities\/(\d+)/'
[list_page.fields]
name = 'a[data-field-name="name"]'
sport_type = 'td[data-field-name="sport_type"]'
date = "td.col-date"
time = 'td[data-field-name="time"]'
distance = "td.col-dist"
elevation = "td.col-elev"
suffer_score = "td.col-suffer-score"
[detail_page]
url_template = "https://www.strava.com/activities/{id}"
js_extract = '''
(function() { /* JS that extracts all activity data and returns JSON */ })()
'''
```
To add a new provider, create a TOML file with the same structure and load it via `ProviderConfig::from_file()`.
## Activity Data Model
Activities scraped from detail pages include:
| Core | id, name, sport_type, start_date, duration_seconds |
| Distance | distance_meters, elevation_gain, pace, gap |
| Heart Rate | average_heart_rate, max_heart_rate |
| Power | average_power, max_power, normalized_power |
| Cadence | average_cadence |
| Speed | average_speed, max_speed |
| Training | suffer_score, calories, elapsed_time_seconds |
| Weather | temperature, feels_like, humidity, wind_speed, wind_direction, weather |
| Equipment | device_name, gear_name |
| Location | city, region, country |
| Other | perceived_exertion, sport_type_detail, workout_type |
## Docker
Pull the image from GitHub Container Registry:
```bash
docker pull ghcr.io/dravr-ai/dravr-sciotte:latest
```
The image includes `dravr-sciotte-server`, `dravr-sciotte-mcp`, and Chromium for headless scraping.
```bash
# Start the REST + MCP server
docker run -p 3000:3000 ghcr.io/dravr-ai/dravr-sciotte
# Mount session directory for persistent login
docker run -p 3000:3000 \
-v ~/.config/dravr-sciotte:/home/dravr/.config/dravr-sciotte \
ghcr.io/dravr-ai/dravr-sciotte
# Run the MCP server
docker run --entrypoint dravr-sciotte-mcp ghcr.io/dravr-ai/dravr-sciotte
```
## Architecture
```
Your Application
└── dravr-sciotte (this library)
│
├── Provider Config (TOML-driven)
│ └── providers/strava.toml → login URLs, CSS selectors, JS extraction
│
├── Chrome Scraper (chromiumoxide CDP)
│ ├── browser_login() → visible Chrome, user logs in, cookies captured
│ ├── get_activities() → headless Chrome, list page + pagination
│ └── get_activity() → headless Chrome, detail page JS extraction
│
├── Cache Layer (moka TTL cache)
│ └── CachedScraper → wraps ActivityScraper with in-memory TTL cache
│
├── Auth Persistence (AES-256-GCM)
│ └── ~/.config/dravr-sciotte/session.enc
│
├── MCP Server (library + binary crate)
│ └── dravr-sciotte-mcp → JSON-RPC 2.0 over stdio or HTTP/SSE
│
└── Unified REST API + MCP + CLI (binary crate)
└── dravr-sciotte-server → REST endpoints, MCP HTTP, CLI commands
```
The core `ActivityScraper` trait:
- **`browser_login()`** — open browser, capture session
- **`get_activities()`** — scrape activity list with pagination
- **`get_activity()`** — scrape single activity detail
- **`is_authenticated()`** — check session validity
For detailed API docs see [docs.rs/dravr-sciotte](https://docs.rs/dravr-sciotte).
## License
Licensed under MIT OR Apache-2.0.