dravr-sciotte-server 0.2.0

Unified REST API + MCP server + CLI for sport activity scraping
Documentation
dravr-sciotte-server-0.2.0 has been yanked.

Sciotte — Sport Activity Scraper

crates.io docs.rs CI License

Sport activity scraper with headless Chrome, TOML-configurable providers, and in-memory caching. Logs into sport platforms via a browser (no API credentials needed), scrapes training data from activity pages, and exposes it through four integration surfaces: Rust trait, REST API, MCP server, and CLI.

Table of Contents

Install

Homebrew (macOS / Linux) — recommended

brew tap dravr-ai/tap
brew install dravr-sciotte

This installs two binaries:

  • dravr-sciotte-server — REST API + MCP server + CLI (start with dravr-sciotte-server serve)
  • dravr-sciotte-mcp — standalone MCP server for editor integration

Once installed, login and scrape:

dravr-sciotte-server login
dravr-sciotte-server activities --limit 20

Docker

docker pull ghcr.io/dravr-ai/dravr-sciotte:latest
docker run -p 3000:3000 ghcr.io/dravr-ai/dravr-sciotte

Cargo (library)

[dependencies]
dravr-sciotte = "0.1"

Quick Start

# Login (opens a browser — log in to your account, no API keys needed)
dravr-sciotte-server login

# List activities (fast, from the training page — paginated)
dravr-sciotte-server activities --limit 50

# List with full detail (navigates each activity page for HR, cadence, weather, device, etc.)
dravr-sciotte-server activities --limit 5 --detail --format json

# Auto-login + fetch in one command
dravr-sciotte-server activities --login --limit 10

# Start REST + MCP server
dravr-sciotte-server serve --port 3000

# Start MCP stdio transport (for Claude integration)
dravr-sciotte-server --transport stdio

How It Works

  1. Browser login — opens a visible Chrome window to the provider's login page. You log in normally. Session cookies are captured and encrypted at rest (AES-256-GCM).
  2. List page scraping — navigates to the training/activity list page in headless Chrome, extracts activity rows using CSS selectors defined in the provider TOML.
  3. Pagination — automatically clicks the "next page" button to load more than the initial 20 activities.
  4. Detail enrichment (opt-in via --detail) — navigates into each activity page and extracts full metrics (HR, cadence, weather, device, gear) using a JS snippet from the provider TOML, including structured data from embedded JSON.
  5. Caching — results are cached in-memory with configurable TTL (default 15 min).

REST API Server (dravr-sciotte-server)

A unified HTTP server with built-in MCP support that serves scraped activity data. Supports --transport stdio for MCP-only mode (editor integration).

Usage

# Start on localhost:3000
dravr-sciotte-server serve

# Specify port and host
dravr-sciotte-server serve --port 8080 --host 0.0.0.0

# MCP-only mode via stdio (for editor/client integration)
dravr-sciotte-server --transport stdio

Endpoints

Method Path Description
POST /auth/login Trigger browser login
GET /auth/status Check authentication
GET /api/activities?limit=20 List scraped activities
GET /api/activities/{id} Single activity detail
GET /health Health check with cache stats
POST /mcp MCP Streamable HTTP (JSON-RPC 2.0)

MCP Streamable HTTP

The server also speaks MCP at POST /mcp, accepting JSON-RPC 2.0 requests. Any MCP-compatible client can connect over HTTP instead of stdio.

# MCP initialize handshake
curl http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"curl"}}}'

# List available tools
curl http://localhost:3000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'

Add Accept: text/event-stream to receive SSE-wrapped responses instead of plain JSON.

Authentication

Optional. Set DRAVR_SCIOTTE_API_KEY to require bearer token auth on all endpoints. When unset, all requests are allowed through (localhost development mode).

DRAVR_SCIOTTE_API_KEY=my-secret dravr-sciotte-server serve
curl http://localhost:3000/api/activities -H "Authorization: Bearer my-secret"

MCP Server (dravr-sciotte-mcp)

A library and standalone binary that exposes the activity scraper via the Model Context Protocol. Connect any MCP-compatible client (Claude Desktop, Claude Code, editors, custom agents) to scrape sport activities.

Usage

# Stdio transport (default — for editor/client integration)
dravr-sciotte-mcp

# HTTP transport (for network-accessible deployments)
dravr-sciotte-mcp --transport http --host 0.0.0.0 --port 3000

MCP Tools

Tool Description
auth_status Check if the session is authenticated and valid
browser_login Open a browser window for the user to log in (no API keys needed)
get_activities Scrape activities from the training page
get_activity Scrape detailed data for a single activity by ID
cache_status Get cache hit/miss statistics and entry counts
cache_clear Clear all cached activity data

Client Configuration

Add to your MCP client config (e.g. Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "dravr-sciotte": {
      "command": "dravr-sciotte-mcp"
    }
  }
}

For Claude Code, add the same configuration to your MCP settings.

Library Usage (Rust Trait)

[dependencies]
dravr-sciotte = "0.1"
use dravr_sciotte::{ChromeScraper, CachedScraper, ActivityScraper};
use dravr_sciotte::config::CacheConfig;
use dravr_sciotte::models::ActivityParams;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let scraper = ChromeScraper::default_config();
    let cached = CachedScraper::new(scraper, &CacheConfig::default());

    // Browser login (opens Chrome, user logs in, cookies captured)
    let session = cached.browser_login().await?;

    // Scrape activities
    let params = ActivityParams { limit: Some(20), ..Default::default() };
    let activities = cached.get_activities(&session, &params).await?;

    for activity in &activities {
        println!("{}: {} ({})", activity.id, activity.name, activity.sport_type);
    }
    Ok(())
}

All scraping is driven by a TOML provider config. The ActivityScraper trait can be wrapped by platform crates (e.g. pierre-scraper) with error bridging, following the same pattern as embacle's LlmProvider.

Provider Configuration

Scraping rules are defined in TOML files under providers/. The default provider is Strava (providers/strava.toml), compiled into the binary.

[provider]
name = "strava"
login_url = "https://www.strava.com/login"
login_success_patterns = ["/dashboard", "/athlete", "/feed"]
login_failure_patterns = ["/login", "/session"]

[list_page]
url = "https://www.strava.com/athlete/training"
row_selector = "tr.training-activity-row"
link_selector = 'a[data-field-name="name"]'
id_regex = '/\/activities\/(\d+)/'

[list_page.fields]
name = 'a[data-field-name="name"]'
sport_type = 'td[data-field-name="sport_type"]'
date = "td.col-date"
time = 'td[data-field-name="time"]'
distance = "td.col-dist"
elevation = "td.col-elev"
suffer_score = "td.col-suffer-score"

[detail_page]
url_template = "https://www.strava.com/activities/{id}"
js_extract = '''
(function() { /* JS that extracts all activity data and returns JSON */ })()
'''

To add a new provider, create a TOML file with the same structure and load it via ProviderConfig::from_file().

Activity Data Model

Activities scraped from detail pages include:

Category Fields
Core id, name, sport_type, start_date, duration_seconds
Distance distance_meters, elevation_gain, pace, gap
Heart Rate average_heart_rate, max_heart_rate
Power average_power, max_power, normalized_power
Cadence average_cadence
Speed average_speed, max_speed
Training suffer_score, calories, elapsed_time_seconds
Weather temperature, feels_like, humidity, wind_speed, wind_direction, weather
Equipment device_name, gear_name
Location city, region, country
Other perceived_exertion, sport_type_detail, workout_type

Docker

Pull the image from GitHub Container Registry:

docker pull ghcr.io/dravr-ai/dravr-sciotte:latest

The image includes dravr-sciotte-server, dravr-sciotte-mcp, and Chromium for headless scraping.

# Start the REST + MCP server
docker run -p 3000:3000 ghcr.io/dravr-ai/dravr-sciotte

# Mount session directory for persistent login
docker run -p 3000:3000 \
  -v ~/.config/dravr-sciotte:/home/dravr/.config/dravr-sciotte \
  ghcr.io/dravr-ai/dravr-sciotte

# Run the MCP server
docker run --entrypoint dravr-sciotte-mcp ghcr.io/dravr-ai/dravr-sciotte

Architecture

Your Application
    └── dravr-sciotte (this library)
            │
            ├── Provider Config (TOML-driven)
            │   └── providers/strava.toml → login URLs, CSS selectors, JS extraction
            │
            ├── Chrome Scraper (chromiumoxide CDP)
            │   ├── browser_login()     → visible Chrome, user logs in, cookies captured
            │   ├── get_activities()    → headless Chrome, list page + pagination
            │   └── get_activity()      → headless Chrome, detail page JS extraction
            │
            ├── Cache Layer (moka TTL cache)
            │   └── CachedScraper      → wraps ActivityScraper with in-memory TTL cache
            │
            ├── Auth Persistence (AES-256-GCM)
            │   └── ~/.config/dravr-sciotte/session.enc
            │
            ├── MCP Server (library + binary crate)
            │   └── dravr-sciotte-mcp  → JSON-RPC 2.0 over stdio or HTTP/SSE
            │
            └── Unified REST API + MCP + CLI (binary crate)
                └── dravr-sciotte-server → REST endpoints, MCP HTTP, CLI commands

The core ActivityScraper trait:

  • browser_login() — open browser, capture session
  • get_activities() — scrape activity list with pagination
  • get_activity() — scrape single activity detail
  • is_authenticated() — check session validity

For detailed API docs see docs.rs/dravr-sciotte.

License

Licensed under MIT OR Apache-2.0.