scrapebadger 0.2.0

Async Rust SDK and CLI for the ScrapeBadger web-scraping API (Amazon, Google, Twitter/X, Reddit, Vinted, Web Scraping).
Documentation

scrapebadger

Async Rust SDK and CLI for the ScrapeBadger web-scraping API — 137 endpoints across Amazon, Google (16 product APIs), Twitter/X, Reddit, Vinted, Web Scraping, and Account, plus real-time Twitter Streams (WebSocket + HMAC webhooks).

One crate ships a library and a binary, both named scrapebadger.

Install

[dependencies]
scrapebadger = "0.2"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }

Library

use scrapebadger::ScrapeBadger;

#[tokio::main]
async fn main() -> scrapebadger::Result<()> {
    // Reads SCRAPEBADGER_API_KEY (or use ScrapeBadger::new("sb_live_…")).
    let client = ScrapeBadger::from_env()?;

    let me = client.account().get_account_info(Default::default()).await?;
    println!("plan: {:?}", me);

    let flights = client
        .google()
        .flights_search(scrapebadger::google::FlightsSearchParams {
            departure_id: Some("DEL".into()),
            arrival_id: Some("BOM".into()),
            outbound_date: Some("2026-07-01".into()),
            ..Default::default()
        })
        .await?;

    let product = client
        .amazon()
        .get_product("B08N5WRWNW", Default::default())
        .await?;

    let _ = (flights, product);
    Ok(())
}

Every endpoint is client.<platform>().<method>(<path args…>, params). Inputs are Default-able *Params structs — set what you need, spread the rest with ..Default::default().

Namespaces

account() · amazon() · google() · reddit() · twitter() · vinted() · web()

Pagination

Generic helpers in core::pagination turn any "fetch one page" closure into a flat Stream. Cursor- and page-paginated endpoints also have ready-made *_stream adapters that follow the pagination for you — Twitter (next_cursor), Reddit (after), and Amazon/Vinted (page numbers):

use futures_util::StreamExt;

# async fn demo(client: scrapebadger::ScrapeBadger) -> scrapebadger::Result<()> {
let stream = client.twitter().get_user_followers_stream("elonmusk", Default::default());
futures_util::pin_mut!(stream);
while let Some(user) = stream.next().await {
    let _user = user?; // one UserData per follower, across all pages
}
# Ok(()) }

Real-time Twitter Streams (feature = "stream", on by default)

use futures_util::StreamExt;

# async fn demo(client: scrapebadger::ScrapeBadger) -> scrapebadger::Result<()> {
let mut events = Box::pin(client.twitter().stream_events().await?);
while let Some(event) = events.next().await {
    let event = event?;
    println!("@{:?}: {:?}", event.author_username, event.tweet_url);
}
# Ok(()) }

For long-lived consumers, client.twitter().stream_events_reconnecting() returns an endless stream that reconnects with exponential backoff on drop/error.

Verify webhook callbacks with scrapebadger::twitter::stream::verify_webhook_signature(secret, body, header).

CLI

Every one of the 137 endpoints is a nested subcommand (scrapebadger <platform> <group> <action> [<ids>] [--flags]), generated from the same specs as the SDK. The full tree is in docs/CLI.md and on docs.rs under cli_reference.

# Store the key once in the global config (~/.config/scrapebadger/config.json, chmod 600):
scrapebadger config set-key sb_live_xxx
# (or `export SCRAPEBADGER_API_KEY=sb_live_xxx`, or pass `--api-key`)

scrapebadger account me
scrapebadger web scrape --url https://example.com --format markdown --render-js
scrapebadger google flights search --departure-id DEL --arrival-id BOM --outbound-date 2026-07-01
scrapebadger amazon products get B08N5WRWNW
scrapebadger reddit subreddits posts sneakers --sort new --limit 10 --select '.posts[].title'

Discovering commands

scrapebadger reddit --help        # lists every reddit command at once (no drilling)
scrapebadger --help-all           # the entire tree, all platforms
scrapebadger commands | grep wiki # grep-able flat listing
scrapebadger completions zsh      # shell completion script

Output, inspection & escape hatch

-o json|jsonl|raw                 # output format (pretty JSON default)
--select '.posts[].title'         # project a field path
--all                             # auto-follow pagination cursors (best-effort)
--explain                         # print the resolved HTTP request, don't send
--curl                            # print an equivalent curl command

# raw reaches any endpoint directly:
scrapebadger raw /v1/amazon/products/B08N5WRWNW
scrapebadger raw --method POST /v1/web/scrape -d '{"url":"https://example.com"}'

How it works

Typed models and per-endpoint methods are generated from the vendored OpenAPI specs (specs/*.json) by cargo run -p xtask -- gen; the ergonomic namespace layer and transport core are hand-written. See ARCHITECTURE.md for the full design, codegen notes, and the complete endpoint reference, and TASKS.md for build status.

Configuration

use std::time::Duration;
let client = scrapebadger::ScrapeBadger::builder()
    .api_key("sb_live_xxx")
    .timeout(Duration::from_secs(120))
    .max_retries(5)
    .build()?;
# Ok::<(), scrapebadger::Error>(())

Features

  • cli (default) — the scrapebadger binary.
  • stream (default) — Twitter Streams WebSocket + webhook verification.

License

MIT