wme-client 0.1.2

HTTP client for the Wikimedia Enterprise API
Documentation

wme-client

A robust, production-ready HTTP client for the Wikimedia Enterprise API.

Features

  • Complete API Coverage: Access all Wikimedia Enterprise endpoints including metadata, on-demand, snapshots, and realtime streaming
  • Authentication Management: Automatic token refresh and secure credential handling
  • Resilient by Design: Built-in retry logic with exponential backoff, jitter, and circuit breaker patterns
  • Streaming Support: Efficiently handle large snapshot downloads and realtime SSE streams
  • Type-Safe: Full Rust type definitions for all API responses
  • Async/Await: Built on tokio for high-performance asynchronous operations

Installation

Add this to your Cargo.toml:

[dependencies]
wme-client = "0.1.2"
tokio = { version = "1", features = ["full"] }

Quick Start

use wme_client::WmeClient;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a client with authentication
    let client = WmeClient::builder()
        .credentials("username", "password")
        .build()
        .await?;

    // List available projects
    let projects = client.metadata().list_projects().await?;
    println!("Available projects: {:?}", projects);

    Ok(())
}

Authentication

The client supports username/password authentication with automatic token management:

let client = WmeClient::builder()
    .credentials("your_username", "your_password")
    .build()
    .await?;

Tokens are automatically refreshed before expiration. You can also manually revoke tokens:

// Get the token manager
if let Some(token_manager) = client.token_manager() {
    token_manager.revoke_token().await?;
}

API Clients

Metadata Client

Discover available projects, languages, and namespaces:

let metadata = client.metadata();

// List all projects
let projects = metadata.list_projects().await?;

// Get specific project info
let wikipedia = metadata.get_project("en.wikipedia").await?;

// List languages
let languages = metadata.list_languages().await?;

// List namespaces for a project
let namespaces = metadata.list_namespaces().await?;

On-Demand Client

Fetch individual articles:

let on_demand = client.on_demand();

// Get a single article
let articles = on_demand.get_article("Rust (programming language)", None).await?;

// Get multiple articles efficiently
let articles = on_demand.get_articles(&["NASA", "SpaceX"], None).await?;

// Get structured article data (BETA)
let structured = on_demand.get_structured_article("Python (programming language)", None).await?;

Snapshot Client

Download bulk data snapshots:

use futures::StreamExt;

let snapshot = client.snapshot();

// List available snapshots
let snapshots = snapshot.list_snapshots().await?;

// Get snapshot metadata
let info = snapshot.get_snapshot_info(&snapshot_id).await?;

// Download a snapshot as a stream
let mut stream = snapshot.download_snapshot(&snapshot_id, None).await?;
let mut data = Vec::new();
while let Some(chunk) = stream.next().await {
    let chunk = chunk?;
    data.extend_from_slice(&chunk);
}

// Download specific chunks
let chunks = snapshot.list_chunks(&snapshot_id).await?;
let mut stream = snapshot.download_chunk(&snapshot_id, &chunk_id, None).await?;

Realtime Client

Stream article updates in real-time:

use wme_client::RealtimeConnectOptions;
use chrono::Utc;
use futures::StreamExt;

let realtime = client.realtime();

// Connect to live stream
let options = RealtimeConnectOptions::since(Utc::now() - Duration::hours(1));
let mut stream = realtime.connect(&options, None).await?;

while let Some(result) = stream.next().await {
    match result {
        Ok(update) => println!("Updated: {}", update.article.name),
        Err(e) => eprintln!("Error: {}", e),
    }
}

Realtime Batches

For historical realtime data, use batches:

// List available batches
let batches = realtime.list_batches("2024-01-15", "12").await?;

// Stream a batch as parsed articles
let mut stream = realtime.stream_batch("2024-01-15", "12", "batch_001").await?;
while let Some(result) = stream.next().await {
    match result {
        Ok(article) => println!("Article: {}", article.name),
        Err(e) => eprintln!("Parse error: {}", e),
    }
}

Configuration

Retry Configuration

Customize retry behavior for your use case:

use wme_client::RetryConfig;
use std::time::Duration;

// Production-grade configuration
let retry = RetryConfig::production();

// Development configuration (faster retries)
let retry = RetryConfig::development();

// Batch processing (more retries, longer delays)
let retry = RetryConfig::batch_processing();

// Custom configuration
let retry = RetryConfig::new()
    .with_max_retries(5)
    .with_base_delay(Duration::from_secs(1))
    .with_max_delay(Duration::from_secs(60))
    .with_jitter(0.25);

let client = WmeClient::builder()
    .credentials("username", "password")
    .retry(retry)
    .build()
    .await?;

Custom Base URLs

For testing or private deployments:

let client = WmeClient::builder()
    .api_url("https://api.example.com")
    .auth_url("https://auth.example.com")
    .realtime_url("https://realtime.example.com")
    .credentials("username", "password")
    .build()
    .await?;

Timeout Configuration

let client = WmeClient::builder()
    .credentials("username", "password")
    .timeout(Duration::from_secs(120))
    .build()
    .await?;

Disable Retry Logic

For debugging or when you want full control:

let client = WmeClient::builder()
    .credentials("username", "password")
    .disable_retry()
    .build()
    .await?;

Error Handling

The client uses a comprehensive error type:

use wme_client::ClientError;

match result {
    Ok(data) => println!("Success: {:?}", data),
    Err(ClientError::Auth(msg)) => eprintln!("Authentication failed: {}", msg),
    Err(ClientError::RateLimited { retry_after }) => {
        eprintln!("Rate limited! Retry after: {:?} seconds", retry_after);
    }
    Err(ClientError::SnapshotNotFound { id }) => {
        eprintln!("Snapshot not found: {}", id);
    }
    Err(ClientError::ArticleNotFound { name }) => {
        eprintln!("Article not found: {}", name);
    }
    Err(e) => eprintln!("Error: {}", e),
}

Advanced Usage

Request Parameters

Most endpoints support filtering and field selection:

use wme_models::RequestParams;

let params = RequestParams {
    filters: Some(vec![
        ("project".to_string(), "en.wikipedia".to_string()),
    ]),
    fields: Some(vec![
        "name".to_string(),
        "url".to_string(),
    ]),
    limit: Some(100),
    offset: Some(0),
};

let projects = client.metadata().list_projects_with_params(Some(&params)).await?;

Resume Realtime Stream

Resume from where you left off using timestamps:

use std::collections::HashMap;

// Per-partition resume (recommended for production)
let mut since_per_partition = HashMap::new();
since_per_partition.insert("0".to_string(), last_seen_timestamp);
since_per_partition.insert("1".to_string(), last_seen_timestamp);

let options = RealtimeConnectOptions::since_per_partition(since_per_partition);
let stream = client.realtime().connect(&options, None).await?;

Circuit Breaker

The retry transport includes a circuit breaker that opens after consecutive failures:

  • Closed: Normal operation
  • Open: Requests fail fast to prevent cascading failures
  • HalfOpen: Testing if service has recovered

Configure thresholds:

let retry = RetryConfig::new()
    .with_circuit_threshold(10)  // Open after 10 consecutive failures
    .with_circuit_timeout(Duration::from_secs(120));  // Try again after 2 minutes

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.