Adler
Named for Irene Adler — "the Woman", the one who outwitted Sherlock Holmes. Where Sherlock searched, Adler outsmarts.
OSINT username search across hundreds of sites. A modern successor to Sherlock — multi-signal detection, self-healing site signatures, optional enrichment and cross-site correlation, written in Rust.
Status: in development. See PLAN.md for the full roadmap.
Crates
| Crate | Kind | Purpose |
|---|---|---|
adler-core |
lib | Detection engine, site registry, executor. |
adler-cli |
bin | adler command-line interface. |
Install
From crates.io (compiles locally, ~1–2 min):
Pre-built binary from the GitHub release (instant, no compile):
From source:
Requires Rust ≥ 1.85. The installed binary is adler. The library
(adler-core) is published separately
for embedding the engine in your own tools — see the
Library section below.
Build & run
Logging is controlled by the ADLER_LOG env var (defaults to adler=info):
ADLER_LOG=adler=debug
Detection rate
Recall depends on where you scan from. A --doctor pass on 2026-05-26
against the bundled registry (411 sites):
| Scan source | Sites where a known-existing account is found | Recall |
|---|---|---|
| Datacenter IP (Hetzner / Leaseweb DE) | 282 / 411 | 68.6% |
| US residential proxy pool (DECODO) | 305 / 411 | 74.2% |
The residential lift is real: ~40 sites swap their verdict between
Uncertain (datacenter) and Found (residential) — most are
Cloudflare-walled or geo-restricted (RU-segment, plus platforms like
Reddit, Imgur, Patreon). The remaining ~26% breaks down roughly as:
- Bot-protected sites tagged
bot-protected(Instagram and X/Twitter today) — these serve a JS login wall to a plain HTTP request; a clean IP doesn't help, you need a browser backend. Exclude them with--exclude-tag bot-protected. - Stale Sherlock-imported
known_presentaccounts that no longer exist on the live site. The--doctor --suggest-known-presenttool (new in v0.4.0) probes a small candidate pool (the site's brand name, plustorvalds/octocat/admin/ …) and prints a paste-ready snippet for any site where it finds a live account. Discovery surfaced 19 healable entries on the most recent sweep; the remaining placeholders need either a contributor-found candidate or a deeper repair via--doctor --fix. - Sites whose detection rule fires for every username —
signal repair territory, not username repair.
--doctor --fixdiffs the responses and proposes a tighter signal. - Sites that don't reliably distinguish found from not-found for unauthenticated requests at all — investigated and not added rather than ship false-positive entries: Reddit, TikTok, Pinterest, and Threads. See issues #11–#14 for the specific failure modes and what would unblock each.
Run the same check yourself: adler --doctor (uses your current IP)
or adler --doctor --proxy <url> (via your own proxy). With
--browser-backend browserbase the doctor's --fix mode routes
bot-protected sites through a real Chrome session, so the diff sees
real profile pages rather than two identical login walls. With
--suggest-known-present you get an OVERRIDES block per healable
site.
Browser backend (optional)
A small subset of sites — currently Instagram and Twitter
(adler --list-tags shows the live count; the tag is kept narrow
because every additional candidate we investigated either detects
fine without a browser or is structurally unscrapable even with
one — see Detection rate above) — serve a JavaScript login wall
or a Cloudflare challenge to a plain HTTP request. They're tagged
bot-protected and, on the raw HTTP path, will always return
Uncertain because the response looks identical for an existing
account and a missing one.
With --browser-backend Adler routes those sites (and only those —
everything else stays on the fast HTTP path) through a real headless
Chrome that runs JS, accepts cookies, and returns the final post-render
DOM. The same detection signals then apply, and a verdict becomes
possible.
Two backends are supported, picked at the CLI:
| Flag | What it does | Cost | Requirements |
|---|---|---|---|
--browser-backend local |
Launches headless Chrome on your machine via chromiumoxide |
Free | Chrome / Chromium installed locally |
--browser-backend browserbase |
Opens a remote session on Browserbase and connects over the CDP WebSocket | Pay per session-minute (≈ $0.05/min) | ADLER_BROWSERBASE_API_KEY and ADLER_BROWSERBASE_PROJECT_ID env vars. Drives CDP through a small in-tree async client (adler-core/src/browser/cdp.rs) — neither chromiumoxide nor headless_chrome could attach to Browserbase's remote browser cleanly (issue #5), so we wrote our own. |
Both reuse a single browser instance across all routed fetches for the scan, so cost / setup overhead is one-time.
Examples
# Use local Chrome — pairs cleanly with --proxy (passed through as
# --proxy-server to the child process).
# Cloud session with residential / mobile IP and anti-fingerprint baked in.
# Cap the number of browser-routed probes (default 50). Once exceeded,
# remaining bot-protected sites return Uncertain(browser_budget_exceeded).
# Disable for one run even if the env / a shell alias has it on.
Guardrails
- Per-scan budget —
--browser-budget Ncaps how many browser fetches a single scan may consume. Default is 50, ≈ 5× thebot-protectedsubset of the registry, so the cap only ever fires if a flag is misconfigured. - No surprise routing — only sites tagged
bot-protectedare sent through the browser. Everything else is unaffected. Useadler --list-tagsto see what's tagged. - Privacy — the
browserbasebackend sends the URLs you scan to a third-party US-based service. Thelocalbackend doesn't leave your machine (modulo whatever proxy you've configured Chrome to use).
Trade-offs vs. raw HTTP
Browser fetches are inherently 5–10× slower than raw HTTP and (for
browserbase) cost real money. They're the only way to detect
accounts on the bot-protected subset, but on the rest of the registry
they would add latency for no recall gain — which is why routing is
opt-in and tag-driven, not blanket.
Usage
# output formats
# deeper analysis (these fetch fresh data, bypassing the cache)
# throughput & network hygiene
# shell completions
By default the text output shows Found and Uncertain results and hides the
(usually many) NotFound rows — pass --all for the full list. On an
interactive terminal, results stream in as they resolve; piped output is
collected and ordered. --tui opens a live browser (results stream in as the
scan runs): / search, f filter by verdict, g/G/PageUp/PageDown to
navigate, o open the selected URL, y/Y copy one/all URLs, Enter for
details, ? for the full key list. Wide terminals show a persistent
list+detail split.
Results are cached between runs (~/.cache/adler/, 1 h TTL); use
--no-cache, --cache-ttl, or --cache-clear to control it. Exit codes:
0 something found, 1 nothing found, 2 error.
Performance
A scan is network-bound: the engine itself is negligible. The executor::run
benchmark (cargo bench -p adler-core) fans out 50 probes against a local
mock server in ~1.6 ms total — roughly 32 µs per site of framework
overhead (~30K sites/s), while a real HTTP request takes 100–1000 ms. So
wall-clock time is set almost entirely by how many requests are in flight.
The lever that matters is therefore concurrency, not micro-optimisation:
--concurrency(default 32) bounds in-flight probes. Most sites are distinct hosts, so the per-host throttle rarely serialises; raising it (e.g.--concurrency 64) shortens large scans, with diminishing returns past your network's limits.- The result cache (
~/.cache/adler/) skips re-probing unchanged sites between runs entirely. --max-rpstrades throughput for politeness when you need a global cap.
Library
adler-core is the runtime-agnostic engine that powers the CLI;
it's published separately on
crates.io so you can embed
username detection in your own Rust tools. Add to your Cargo.toml:
[]
= "0.4"
= { = "1", = ["macros", "rt-multi-thread"] }
Minimal worked example — load the embedded registry, scan one username, print the hits:
use ;
async
See docs.rs/adler-core for the
full API. Notable knobs:
Client::builder() |
timeout, redirect policy, user-agent rotation, proxy, retry, rotate-UA, throttle, cache, browser backend, NSFW gate. |
Registry::filter |
include/exclude by name substring, tag, nsfw opt-in (the 5th include_nsfw: bool parameter — pass true to scan adult sites). |
Site::request_headers |
per-site HTTP headers (e.g. Instagram's X-IG-App-ID); browser backends apply via Network.setExtraHTTPHeaders. |
Site::regex_check |
per-site username-validity regex. Mismatched usernames short-circuit to Uncertain(UsernameNotAllowed) without a network request. |
Site::known_present |
KnownPresent::Single(String) or KnownPresent::Multiple(Vec<String>); --doctor passes if any declared username resolves to Found. |
BrowserBackend trait |
route bot-protected sites through real Chrome. Built-in: LocalBackend (chromiumoxide) and BrowserbaseBackend (cloud CDP). |
Breaking changes since 0.1: the Registry::filter signature
grew an include_nsfw: bool (v0.4.0), Site::known_present now
accepts a KnownPresent enum instead of Option<String> (v0.3.0),
Site::request_headers and Site::regex_check are new fields
(v0.2.0 / v0.4.0 respectively). The
CHANGELOG has the migration notes for each.
Site registry
The default registry (adler-core/data/sites.json, ~2.5k sites) is generated
from MIT-licensed upstream data — the
Sherlock project (base) plus
the Maigret project (engine-inherited
forum platforms and additional sites) — via scripts/import_sherlock.py
and scripts/import_maigret.py. Detections are imported unverified —
upstream signatures rot over time. Validate them with the built-in health
check:
--doctor probes each site's known-present user (must be Found) and a random
nonsense user (must not be Found), reporting any site whose detection no
longer holds. --doctor --fix additionally suggests a corrected signature
for failing sites by diffing the present/absent responses. A nightly GitHub
Actions workflow (.github/workflows/doctor.yml) runs the check across the
whole registry and flags structural rot.
A supplementary registry derived from
WhatsMyName is shipped in
adler-core/data/sites_wmn.json and is included by default for
maximum coverage — it adds ~675 sites with two-sided body+status
detection signatures. The file is licensed CC BY-SA 4.0; if you
redistribute Adler scan output and need an MIT-only data lineage,
pass --no-wmn to drop the tranche.
Quality bar
CI must pass on every push:
Ethics & responsible use
Adler aggregates publicly reachable profile URLs, but aggregation makes intrusion easy — please use it responsibly.
Intended uses: checking your own accounts; authorized penetration tests and bug-bounty engagements; security research; and OSINT investigations with a lawful basis. Do not use Adler to stalk, harass, dox, or surveil people without authorization, or to mass-target individuals.
Detect, never circumvent. Adler reports anti-bot responses (rate limits,
Cloudflare challenges, captchas) as Uncertain — it does not solve captchas
or bypass access controls. It rate-limits per host, supports --max-rps and
--respect-robots, and writes an optional --audit-log of every request.
See SECURITY.md and CODE_OF_CONDUCT.md.
License
The Adler code is licensed under the MIT License.
The default site registry (adler-core/data/sites.json) is also under MIT
— it is derived from the Sherlock project (MIT) and the Maigret project
(MIT). See the file's _comment header and the corresponding importer
scripts in scripts/ for attribution.
The supplementary registry (adler-core/data/sites_wmn.json, included
by default; opt-out with adler --no-wmn) is derived from WhatsMyName
and licensed CC BY-SA 4.0. Adler's MIT licence
does not cover this file; downstream redistribution must preserve
attribution and the ShareAlike obligation on derivative data.