dravr-sciotte-server-0.1.0 has been yanked.

dravr-sciotte

Sport activity scraper with headless Chrome, TOML-configurable providers, and in-memory caching.

Logs into sport platforms via a browser (no API credentials needed), scrapes training data from activity pages, and exposes it through four integration surfaces: Rust trait, REST API, MCP server, and CLI.

Quick Start

# Login (opens a browser — log in to your account)
dravr-sciotte-server login

# List activities (fast, from the training page)
dravr-sciotte-server activities --limit 20

# List with full detail (navigates each activity page for HR, cadence, weather, device, etc.)
dravr-sciotte-server activities --limit 5 --detail --format json

# Start REST + MCP server
dravr-sciotte-server serve --port 3000

# Start MCP stdio transport (for Claude integration)
dravr-sciotte-server --transport stdio

How It Works

Browser login — opens a visible Chrome window to the provider's login page. You log in normally. Session cookies are captured and encrypted at rest.
List page scraping — navigates to the training/activity list page in headless Chrome, extracts activity rows using CSS selectors defined in the provider TOML.
Detail enrichment (opt-in via --detail) — navigates into each activity page and extracts full metrics using a JS snippet from the provider TOML.
Caching — results are cached in-memory with configurable TTL (default 15 min).

Provider Configuration

Scraping rules are defined in TOML files under providers/. The default provider is Strava (providers/strava.toml), compiled into the binary.

Example: Strava (`providers/strava.toml`)

[provider]
name = "strava"
login_url = "https://www.strava.com/login"
login_success_patterns = ["/dashboard", "/athlete", "/feed"]
login_failure_patterns = ["/login", "/session"]

[list_page]
url = "https://www.strava.com/athlete/training"
row_selector = "tr.training-activity-row"
link_selector = 'a[data-field-name="name"]'
id_regex = '/\/activities\/(\d+)/'

[list_page.fields]
name = 'a[data-field-name="name"]'
sport_type = 'td[data-field-name="sport_type"]'
date = "td.col-date"
time = 'td[data-field-name="time"]'
distance = "td.col-dist"
elevation = "td.col-elev"
suffer_score = "td.col-suffer-score"

[detail_page]
url_template = "https://www.strava.com/activities/{id}"
js_extract = '''
(function() {
    // ... JS that extracts activity data and returns JSON ...
})()
'''

To add a new provider, create a TOML file with the same structure and load it via ProviderConfig::from_file().

Integration Modes

CLI

dravr-sciotte-server login                          # Browser login
dravr-sciotte-server activities --limit 10          # List activities
dravr-sciotte-server activities --detail --format json  # Full detail as JSON
dravr-sciotte-server auth-status                    # Check session
dravr-sciotte-server serve --port 3000              # Start REST server

REST API

Method	Path	Description
POST	`/auth/login`	Trigger browser login
GET	`/auth/status`	Check authentication
GET	`/api/activities?limit=20`	List activities
GET	`/api/activities/{id}`	Activity detail
GET	`/health`	Health check
POST	`/mcp`	MCP HTTP transport

MCP (Model Context Protocol)

6 tools available via stdio or HTTP transport:

Tool	Description
`auth_status`	Check session status
`browser_login`	Open browser for login
`get_activities`	Scrape activity list
`get_activity`	Scrape single activity detail
`cache_status`	Cache hit/miss stats
`cache_clear`	Clear cached data

Rust Trait

use dravr_sciotte::{ChromeScraper, CachedScraper, ActivityScraper};
use dravr_sciotte::config::CacheConfig;

let scraper = ChromeScraper::default_config();
let cached = CachedScraper::new(scraper, &CacheConfig::default());

let session = cached.browser_login().await?;
let activities = cached.get_activities(&session, &params).await?;

Activity Data Model

Activities scraped from detail pages include:

Category	Fields
Core	id, name, sport_type, start_date, duration_seconds
Distance	distance_meters, elevation_gain, pace, gap
Heart Rate	average_heart_rate, max_heart_rate
Power	average_power, max_power, normalized_power
Cadence	average_cadence
Speed	average_speed, max_speed
Training	suffer_score, calories, elapsed_time_seconds
Weather	temperature, feels_like, humidity, wind_speed, wind_direction, weather
Equipment	device_name, gear_name
Location	city, region, country
Other	perceived_exertion, sport_type_detail, workout_type

Architecture

dravr-sciotte/
├── providers/strava.toml          # Provider config (selectors, JS, URLs)
├── src/                           # Core library
│   ├── provider.rs                # TOML config loading and JS generation
│   ├── scraper.rs                 # Chrome-based scraping engine
│   ├── models.rs                  # Activity data model
│   ├── cache.rs                   # In-memory TTL cache
│   ├── auth.rs                    # Session encryption/persistence
│   └── types.rs                   # ActivityScraper trait
├── crates/dravr-sciotte-mcp/      # MCP server (stdio + HTTP)
└── crates/dravr-sciotte-server/   # REST API + CLI

Environment Variables

Variable	Description	Default
`CHROME_PATH`	Path to Chrome/Chromium binary	auto-detect
`DRAVR_SCIOTTE_API_KEY`	Bearer token for REST auth	none (open)
`DRAVR_SCIOTTE_CACHE_TTL`	Cache TTL in seconds	900 (15 min)
`DRAVR_SCIOTTE_CACHE_MAX`	Max cache entries	100
`DRAVR_SCIOTTE_SESSION_DIR`	Session storage directory	`~/.config/dravr-sciotte/`

License

MIT OR Apache-2.0

dravr-sciotte-server 0.1.0