adler-core 0.10.0

Core engine for the Adler OSINT username-search tool.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
<p align="center">
  <img src="banner.png" alt="Adler" />
</p>

<p align="center">
  <a href="https://github.com/commit3296/adler/actions/workflows/ci.yml"><img src="https://github.com/commit3296/adler/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://crates.io/crates/adler-cli"><img src="https://img.shields.io/crates/v/adler-cli.svg" alt="crates.io"></a>
  <a href="https://docs.rs/adler-core"><img src="https://docs.rs/adler-core/badge.svg" alt="docs.rs"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a>
</p>

# Adler

> *Named for Irene Adler — "the Woman", the one who outwitted Sherlock Holmes.
> Where Sherlock searched, Adler outsmarts.*

OSINT username search across ~3,000 sites, in Rust. Honest verdicts and built
to reach the hard ones — Cloudflare-walled, TLS-fingerprinted, geo-restricted,
login-walled. See [PLAN.md](PLAN.md) for the roadmap.

## How Adler compares

Open-source username-search tools that OSINT operators actually consider, on
the dimensions that matter when sites push back:

|                              | [Sherlock][cmp-s] | [Maigret][cmp-m] | [Blackbird][cmp-b] | [Snoop][cmp-sn] | **Adler** |
| ---------------------------- | :---: | :---: | :---: | :---: | :---: |
| Approx. sites                | 400 | 3,000 | 600 | 5,400 | 3,000 [^cmp-1] |
| Verdict model                | Found / NotFound | Found / NotFound | Found / NotFound | Found / NotFound | **Found / NotFound / Uncertain(reason)** |
| Bot-protected sites (Instagram, X, …) ||||| **headless Chrome via `--browser-backend`** |
| TLS-fingerprint blocking     ||||| **Chrome 134 handshake via `--features impersonate`** |
| Proxy routing                | one global | one global + Tor + I2P ||| one global **or** per-site policy via `--proxy-pool` |
| Cookies / sessions           || global `cookies.txt` ||| **per-site named sessions** via `--sessions` |
| Registry self-heal           ||||| **`--doctor --fix` diffs responses, proposes new signatures** |
| Web UI                       || yes (results graph, reports) ||| `--web` — live SSE-streaming SolidJS SPA + JSON API |
| Output formats               | text / CSV / XLSX / JSON | text / JSON / CSV / HTML / PDF / XMind / D3 | text / CSV / PDF | text / CSV / HTML | text / JSON / NDJSON / CSV / HTML |
| Embeddable library           || yes (Python async) ||| `adler-core` on crates.io (Rust) |
| Runtime / packaging          | Python | Python | Python | Python | **Rust — single static binary, `cargo binstall`** |

[^cmp-1]: Sherlock + Maigret + WhatsMyName lineages combined; see [*Site registry*]#site-registry.

[cmp-s]: https://github.com/sherlock-project/sherlock
[cmp-m]: https://github.com/soxoj/maigret
[cmp-b]: https://github.com/p1ngul1n0/blackbird
[cmp-sn]: https://github.com/snooppr/snoop

**Adler's thesis: honest verdicts plus access for the sites that matter.** A
`NotFound` from a Python-HTTP-only tool on a Cloudflare-walled, TLS-
fingerprinted, geo-restricted, or login-walled site is often just "I gave up
at the first wall." Adler reports `Uncertain(reason)` when it couldn't verify,
and ships the transports you need to break the wall yourself — headless
browser, Chrome handshake emulation, per-site geo / IP-type egress, operator-
supplied sessions. We do not solve CAPTCHAs or evade human-verification (see
[*Ethics & responsible use*](#ethics--responsible-use)).

## Detection rate

Recall depends on where you scan from. A `--doctor` pass on 2026-05-26
against the bundled registry (411 sites):

| Scan source | Sites where a known-existing account is found | Recall |
| --- | ---: | ---: |
| Datacenter IP (Hetzner / Leaseweb DE) | 282 / 411 | 68.6% |
| US residential proxy pool (DECODO) | **305 / 411** | **74.2%** |

The residential lift is real: ~40 sites swap their verdict between
`Uncertain` (datacenter) and `Found` (residential) — most are
Cloudflare-walled or geo-restricted (RU-segment, plus platforms like
Reddit, Imgur, Patreon). The remaining ~26% breaks down roughly as:

- **Bot-protected sites** tagged `bot-protected` (Instagram and
  X/Twitter today) — these serve a JS login wall to a plain HTTP
  request; a clean IP doesn't help, you need a browser backend.
  Exclude them with `--exclude-tag bot-protected`.
- **Stale Sherlock-imported `known_present` accounts** that no
  longer exist on the live site. The `--doctor --suggest-known-present`
  tool (new in v0.4.0) probes a small candidate pool (the site's
  brand name, plus `torvalds` / `octocat` / `admin` / …) and prints
  a paste-ready snippet for any site where it finds a live account.
  Discovery surfaced 19 healable entries on the most recent sweep;
  the remaining placeholders need either a contributor-found
  candidate or a deeper repair via `--doctor --fix`.
- **Sites whose detection rule fires for *every* username**  signal repair territory, not username repair. `--doctor --fix`
  diffs the responses and proposes a tighter signal.
- **Sites that don't reliably distinguish found from not-found** for
  unauthenticated requests at all — investigated and not added
  rather than ship false-positive entries: Reddit, TikTok,
  Pinterest, and Threads. See issues
  [#11–#14]https://github.com/commit3296/adler/issues?q=is%3Aissue+label%3A%22help+wanted%22
  for the specific failure modes and what would unblock each.

Run the same check yourself: `adler --doctor` (uses your current IP)
or `adler --doctor --proxy <url>` (via your own proxy). With
`--browser-backend browserbase` the doctor's `--fix` mode routes
bot-protected sites through a real Chrome session, so the diff sees
real profile pages rather than two identical login walls. With
`--suggest-known-present` you get an OVERRIDES block per healable
site.

## Crates

| Crate         | Kind | Purpose                                              |
| ------------- | ---- | ---------------------------------------------------- |
| `adler-core`  | lib  | Detection engine, site registry, executor.          |
| `adler-server`| lib  | HTTP API + SSE streaming + scan persistence; embeds the SolidJS web UI via `rust-embed`. |
| `adler-cli`   | bin  | `adler` command-line interface; `--web` launches the embedded server + UI in-process. |

## Install

From crates.io (compiles locally, ~1–2 min):

```bash
cargo install adler-cli
```

Pre-built binary from the GitHub release (instant, no compile):

```bash
cargo binstall adler-cli            # https://github.com/cargo-bins/cargo-binstall
```

From source:

```bash
git clone https://github.com/commit3296/adler.git
cd adler
cargo install --path adler-cli
```

Requires Rust ≥ 1.85. The installed binary is `adler`. The library
([`adler-core`](https://crates.io/crates/adler-core)) is published separately
for embedding the engine in your own tools — see the
[*Library*](#library) section below.

## Build & run

```bash
cargo build --workspace
cargo run -p adler-cli -- alice
```

Logging is controlled by the `ADLER_LOG` env var (defaults to `adler=info`):

```bash
ADLER_LOG=adler=debug cargo run -p adler-cli -- alice
```

## Usage

`adler <username>` scans the embedded registry; everything else is a knob.
Text output shows Found and Uncertain rows by default and hides NotFound —
pass `--all` for the full list. Results stream into a terminal as they
resolve; piped output is collected and ordered. For a browser view, pass
`--web` (see [*Web UI*](#web-ui) below). Exit codes: `0` found, `1` nothing
found, `2` error.

`adler --help` has the complete flag reference; the buckets below cover the
common ones by intent.

### Filtering

```bash
adler --only github,gitlab alice         # restrict to matching site names
adler --exclude reddit alice             # drop matching site names
adler --tag social,dev alice             # filter by tag(s)
adler --tag region:ru alice              # by region tag
adler --exclude-tag bot-protected alice  # skip login-walled sites
adler --list-sites --only git            # discover filter terms (no scan)
adler --list-tags                        # show all tags + counts
```

### Output

```bash
adler --format json alice > out.json     # JSON array
adler --format ndjson alice              # one JSON object per line (jq-friendly)
adler --format csv alice > out.csv       # spreadsheet table
adler --format html alice > out.html     # self-contained HTML report
adler --all alice                        # include NotFound rows
adler -q alice                           # quiet: only Found URLs
adler --explain alice                    # show which signal produced each verdict
adler --color never alice                # disable colors (also honors NO_COLOR)
```

### Network & sessions

```bash
adler --concurrency 64 alice             # in-flight probes (default 32)
adler --max-rps 5 alice                  # cap total request rate
adler --proxy socks5://host:1080 alice   # single proxy for everything
adler --proxy-pool pool.toml alice       # per-site geo/IP-type routing — see § Egress pool
adler --sessions sessions.toml alice     # operator-supplied sessions — see § Sessions
adler --tor alice                        # local Tor SOCKS proxy
adler --rotate-ua alice                  # rotate User-Agent per request
```

For TLS-fingerprint-blocked sites, build with `--features impersonate` (see
[*TLS-fingerprint impersonation*](#tls-fingerprint-impersonation-optional-build-feature)).

### Browser & cache

```bash
adler --browser-backend local alice          # headless Chrome for bot-protected
adler --browser-backend browserbase alice    # Browserbase cloud session
adler --browser-budget 20 alice              # cap browser-routed probes (default 50)
adler --no-browser alice                     # off for this run

adler --no-cache alice                       # bypass the result cache
adler --cache-ttl 86400 alice                # custom TTL (default 3600 s)
adler --cache-clear                          # drop the cache
```

Cache lives at `~/.cache/adler/`. See [*Browser backend*](#browser-backend-optional)
for the cost / setup trade-offs.

### Batch & enrichment

```bash
adler --input users.txt                      # batch many usernames, grouped output
adler --watch alice                          # diff vs last run; new/removed
adler --watch --interval 3600 alice          # keep watching
adler --enrich alice                         # extract name/bio/avatar
adler --correlate alice                      # group accounts by signal overlap
adler --permute aggressive alice             # search spelling variants
adler --completions zsh > _adler             # shell completions
```

## Web UI

`adler --web` boots a small in-process HTTP server and serves a SolidJS
SPA from the same binary — no separate frontend deployment, no extra
process to manage. Once the server is up, kick off scans, watch outcomes
stream in over SSE, persist them to disk, and diff them against earlier
runs.

```bash
adler --web                          # http://127.0.0.1:8080
adler --web --web-bind 0.0.0.0:9000  # listen on all interfaces, custom port
```

What you get in the browser:

- **Live scan view** — outcomes stream in as they resolve (SSE), grouped
  by category, with per-row evidence (verdict reason, response snippet,
  URL) and a one-click retry.
- **History modal** — every finished scan is persisted to
  `~/.cache/adler/scans/` (oldest 200, atomic writes). Reopen any past
  scan via `#/scan/<id>` deep-links.
- **Compare with previous** — pick any two persisted scans and diff
  them side-by-side (`#/diff/<a>/<b>`); shows accounts gained / lost /
  flipped between the two runs. Esc / back-button exits.
- **Filters & sort** — by verdict, category, presence of evidence,
  hidden NotFound rows. Preferences persist to localStorage.
- **NSFW gate** — off by default; the toggle is hidden behind a
  confirmation, matching the CLI's `--nsfw` opt-in.

The server exposes a small JSON API at `/api/*` (`/health`, `/sites`,
`/scans`, `POST /scan`, `GET /scan/:id`, `GET /scan/:id/stream`,
`POST /scan/:id/retry`) — useful if you want to drive Adler from a
different frontend or a script. SSE consumers should subscribe to the
`/stream` endpoint and treat each event as one outcome.

The bundled SPA is baked into the binary at compile time
(`rust-embed`), so the deployed unit is just the `adler` executable
plus whatever scan-cache directory you point it at. The SolidJS
project lives at `adler-server/web/`; if you build from source, run
`npm ci && npm run build` there before `cargo build` — Vite emits
`web/dist/`, which `rust-embed` reads directly.

## Browser backend (optional)

A small subset of sites — currently **Instagram and Twitter**
(`adler --list-tags` shows the live count; the tag is kept narrow
because every additional candidate we investigated either detects
fine without a browser or is structurally unscrapable even *with*
one — see *Detection rate* above) — serve a JavaScript login wall
or a Cloudflare challenge to a plain HTTP request. They're tagged
`bot-protected` and, on the raw HTTP path, will *always* return
`Uncertain` because the response looks identical for an existing
account and a missing one.

With `--browser-backend` Adler routes those sites (and *only* those —
everything else stays on the fast HTTP path) through a real headless
Chrome that runs JS, accepts cookies, and returns the final post-render
DOM. The same detection signals then apply, and a verdict becomes
possible.

Two backends are supported, picked at the CLI:

| Flag | What it does | Cost | Requirements |
|---|---|---|---|
| `--browser-backend local` | Launches headless Chrome on your machine via [`chromiumoxide`]https://crates.io/crates/chromiumoxide | Free | Chrome / Chromium installed locally |
| `--browser-backend browserbase` | Opens a remote session on [Browserbase]https://browserbase.com and connects over the CDP WebSocket | Pay per session-minute (≈ $0.05/min) | `ADLER_BROWSERBASE_API_KEY` and `ADLER_BROWSERBASE_PROJECT_ID` env vars. Drives CDP through a small in-tree async client (`adler-core/src/browser/cdp.rs`) — neither `chromiumoxide` nor `headless_chrome` could attach to Browserbase's remote browser cleanly (issue #5), so we wrote our own. |

Both reuse a single browser instance across all routed fetches for the
scan, so cost / setup overhead is one-time.

### Examples

```bash
# Use local Chrome — pairs cleanly with --proxy (passed through as
# --proxy-server to the child process).
adler --browser-backend local --proxy socks5h://USER:PASS@HOST:PORT alice

# Cloud session with residential / mobile IP and anti-fingerprint baked in.
export ADLER_BROWSERBASE_API_KEY=bb_live_...
export ADLER_BROWSERBASE_PROJECT_ID=...
adler --browser-backend browserbase alice

# Cap the number of browser-routed probes (default 50). Once exceeded,
# remaining bot-protected sites return Uncertain(browser_budget_exceeded).
adler --browser-backend browserbase --browser-budget 10 alice

# Disable for one run even if the env / a shell alias has it on.
adler --no-browser alice
```

### Guardrails

- **Per-scan budget**`--browser-budget N` caps how many browser
  fetches a single scan may consume. Default is 50, ≈ 5× the
  `bot-protected` subset of the registry, so the cap only ever fires if
  a flag is misconfigured.
- **No surprise routing** — only sites tagged `bot-protected` are sent
  through the browser. Everything else is unaffected. Use
  `adler --list-tags` to see what's tagged.
- **Privacy** — the `browserbase` backend sends the URLs you scan to a
  third-party US-based service. The `local` backend doesn't leave your
  machine (modulo whatever proxy you've configured Chrome to use).

### Trade-offs vs. raw HTTP

Browser fetches are inherently 5–10× slower than raw HTTP and (for
`browserbase`) cost real money. They're the only way to detect
accounts on the bot-protected subset, but on the rest of the registry
they would add latency for no recall gain — which is why routing is
opt-in and tag-driven, not blanket.

## Egress pool (geo routing)

Some sites only answer from a particular country, or block datacenter
IP ranges. A site can declare what egress it needs via its `access`
policy in the registry (a country and/or an IP type); `--proxy-pool`
supplies the proxies that satisfy those requirements.

`--proxy` still routes *everything* through one proxy (the default
egress). `--proxy-pool` is additive and **only** kicks in for sites
whose `access` policy requires a specific egress — everything else
keeps using the default. If a site needs an egress the pool can't
provide, it's reported `Uncertain(geo_unavailable)` rather than fetched
from the wrong place — a location you can't reach is not evidence the
account is absent.

The pool is a TOML file of `[[egress]]` entries:

```toml
# pool.toml
[[egress]]
url = "socks5://user:pass@pl.example.com:1080"
country = "pl"          # ISO-3166-1 alpha-2 (lowercased)
kind = "residential"    # datacenter (default) | residential | mobile | tor

[[egress]]
url = "http://de.example.com:8080"
country = "de"
# kind omitted → datacenter
```

```bash
adler --proxy-pool pool.toml alice
```

Bring your own proxies — Adler ships the routing, not the egress. The
browser backend keeps its own egress (e.g. Browserbase's residential
IPs); `--proxy-pool` routes the raw-HTTP path.

## Sessions (reach login-walled sites)

Some sites only show a profile to a logged-in user (Instagram, Threads,
Reddit's JSON). A site can declare `access.session = "<name>"` in the
registry; `--sessions <file>` supplies that named session's headers —
your own (or a sock-puppet) account's — applied to the site's probe so
it sees a real session instead of a login wall.

This is "use a real account", not evasion: Adler doesn't solve
challenges or forge anything; you bring a session you're entitled to.
If a site names a session you didn't supply, it's reported
`Uncertain(session_required)` rather than a login-wall false negative.

The file is TOML; each `[name]` table is a set of HTTP headers (copy
them from your browser's devtools):

```toml
# sessions.toml
[ig]
Cookie = "sessionid=...; csrftoken=..."
X-IG-App-ID = "936619743392459"

[reddit]
Cookie = "reddit_session=..."
```

```bash
adler --sessions sessions.toml alice
```

Header values are secrets — redacted from logs, never written to scan
output. Using a sock-puppet account may breach a site's ToS; that's an
operator decision within your engagement's scope.

## TLS-fingerprint impersonation (optional build feature)

Some sites read the TLS handshake's JA3 / JA4 fingerprint and serve a
block page to anything that doesn't look like a real browser — `rustls`
or `reqwest`'s default fingerprints are well-known and easy to
filter. Sites tagged `protection: tls-fingerprint` in the registry
declare this.

Build Adler with the `impersonate` feature to enable an in-process
`wreq` HTTP client emulating Chrome 134 (BoringSSL handshake matches
real Chrome's JA3 / JA4 / HTTP-2 fingerprint). Sites whose protection
is *only* TLS fingerprint then route through it — much cheaper than
spinning up a real browser:

```bash
cargo install adler-cli --features impersonate
```

The feature pulls in BoringSSL and needs `cmake`, a C++ compiler, and
`libclang` at build time (on Fedora: `dnf install cmake gcc-c++
clang`; on Debian/Ubuntu: `apt install cmake clang libclang-dev`).
`cargo binstall adler-cli` ships impersonate-enabled binaries for
x86_64-linux, both macOS targets, and Windows; the
`aarch64-unknown-linux-gnu` binary is built without the feature (cross-
compiled BoringSSL toolchain isn't wired up), so on aarch64 Linux use
`cargo install adler-cli --features impersonate` instead. Sites with
mixed protections (e.g. `tls-fingerprint` + `cloudflare`) stay on the
browser-backend path.

## Performance

A scan is network-bound: the engine itself is negligible. The `executor::run`
benchmark (`cargo bench -p adler-core`) fans out 50 probes against a local
mock server in **~1.6 ms total — roughly 32 µs per site** of framework
overhead (~30K sites/s), while a real HTTP request takes 100–1000 ms. So
wall-clock time is set almost entirely by how many requests are in flight.

The lever that matters is therefore concurrency, not micro-optimisation:

- `--concurrency` (default **32**) bounds in-flight probes. Most sites are
  distinct hosts, so the per-host throttle rarely serialises; raising it
  (e.g. `--concurrency 64`) shortens large scans, with diminishing returns
  past your network's limits.
- The result cache (`~/.cache/adler/`) skips re-probing unchanged sites
  between runs entirely.
- `--max-rps` trades throughput for politeness when you need a global cap.

## Library

`adler-core` is the runtime-agnostic engine that powers the CLI;
it's published separately on
[crates.io](https://crates.io/crates/adler-core) so you can embed
username detection in your own Rust tools. Add to your `Cargo.toml`:

```toml
[dependencies]
adler-core = "0.8"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
```

Minimal worked example — load the embedded registry, scan one
username, print the hits:

```rust
use adler_core::{Client, ExecutorOptions, MatchKind, Registry, Username, executor};

#[tokio::main]
async fn main() -> adler_core::Result<()> {
    let registry = Registry::default_embedded()?;
    // filter(include, exclude, tags, exclude_tags, include_nsfw)
    // — empty slices = no name/tag filter; `false` keeps the
    // default NSFW auto-exclusion.
    let sites = registry.filter(&[], &[], &[], &[], false);
    let username = Username::new("torvalds")?;
    let client = Client::builder().build()?;

    let outcomes = executor::run(
        &client, &sites, &username, ExecutorOptions::default(),
    ).await;

    for outcome in outcomes.iter().filter(|o| o.kind == MatchKind::Found) {
        println!("found: {} → {}", outcome.site, outcome.url);
    }
    Ok(())
}
```

See [`docs.rs/adler-core`](https://docs.rs/adler-core) for the
full API. Notable knobs:

| | |
|---|---|
| `Client::builder()` | timeout, redirect policy, user-agent rotation, proxy, retry, rotate-UA, throttle, cache, browser backend, NSFW gate. |
| `Registry::filter` | include/exclude by name substring, tag, `nsfw` opt-in (the 5th `include_nsfw: bool` parameter — pass `true` to scan adult sites). |
| `Site::request_headers` | per-site HTTP headers (e.g. Instagram's `X-IG-App-ID`); browser backends apply via `Network.setExtraHTTPHeaders`. |
| `Site::regex_check` | per-site username-validity regex. Mismatched usernames short-circuit to `Uncertain(UsernameNotAllowed)` without a network request. |
| `Site::known_present` | `KnownPresent::Single(String)` or `KnownPresent::Multiple(Vec<String>)`; `--doctor` passes if **any** declared username resolves to `Found`. |
| `BrowserBackend` trait | route bot-protected sites through real Chrome. Built-in: `LocalBackend` (chromiumoxide) and `BrowserbaseBackend` (cloud CDP). |

**Breaking changes since 0.1:** the `Registry::filter` signature
grew an `include_nsfw: bool` (v0.4.0), `Site::known_present` now
accepts a `KnownPresent` enum instead of `Option<String>` (v0.3.0),
`Site::request_headers` and `Site::regex_check` are new fields
(v0.2.0 / v0.4.0 respectively). The
[CHANGELOG](CHANGELOG.md) has the migration notes for each.

## Site registry

The default registry (`adler-core/data/sites.json`, ~2.5k sites) is generated
from MIT-licensed upstream data — the
[Sherlock project](https://github.com/sherlock-project/sherlock) (base) plus
the [Maigret project](https://github.com/soxoj/maigret) (engine-inherited
forum platforms and additional sites) — via `scripts/import_sherlock.py`
and `scripts/import_maigret.py`. Detections are imported **unverified** —
upstream signatures rot over time. Validate them with the built-in health
check:

```bash
adler --doctor                 # check every site's signature
adler --doctor --only github   # check a subset
```

`--doctor` probes each site's known-present user (must be Found) and a random
nonsense user (must not be Found), reporting any site whose detection no
longer holds. `--doctor --fix` additionally suggests a corrected signature
for failing sites by diffing the present/absent responses. A nightly GitHub
Actions workflow (`.github/workflows/doctor.yml`) runs the check across the
whole registry and flags structural rot.

A supplementary registry derived from
[WhatsMyName](https://github.com/WebBreacher/WhatsMyName) is shipped in
`adler-core/data/sites_wmn.json` and is **included by default** for
maximum coverage — it adds ~675 sites with two-sided body+status
detection signatures. The file is licensed CC BY-SA 4.0; if you
redistribute Adler scan output and need an MIT-only data lineage,
pass `--no-wmn` to drop the tranche.

## Quality bar

CI must pass on every push:

```bash
cargo fmt --all --check
cargo clippy --all-targets --workspace -- -D warnings
cargo test --workspace
```

## FAQ / Troubleshooting

**Why is everything coming back as `Uncertain`?** Almost always a datacenter
IP that's been mass-banned at the CDN edge. Try `--proxy socks5://...` with a
residential proxy, or `--browser-backend local` for sites tagged
`bot-protected`. `adler --explain alice` prints the signal that flagged each
verdict, so you can tell *why* it was inconclusive (`cloudflare_challenge`,
`geo_unavailable`, `session_required`, …).

**Why does Adler report fewer Found accounts than Sherlock or Maigret?**
Adler's `NotFound` means "verified absent from a working response." Sherlock
and Maigret return `NotFound` even when the response was a Cloudflare wall,
login page, or anti-bot challenge — those are false negatives. Check Adler's
`Uncertain` bucket: most of the apparent "missing" hits are there, with a
*reason*. Resolve the wall (browser, residential IP, sessions) and they
flip to `Found`.

**How do I scan Instagram / X (Twitter) / Threads?** They're tagged
`bot-protected` — plain HTTP gets a login wall. Use `--browser-backend local`
(free, local Chrome) or `--browser-backend browserbase` (paid, residential
cloud). For Instagram specifically, supplying a session via `--sessions` lets
you reach the authenticated profile (see [*Sessions*](#sessions-reach-login-walled-sites)).

**`--proxy` vs `--proxy-pool` — which do I want?** `--proxy` routes
*everything* through one proxy. `--proxy-pool` is per-site: the registry
declares "this site needs a UK residential IP", Adler picks a matching
egress from the pool; sites without a constraint use the default. Mix them
freely.

**A site's signature is stale — how do I fix it?** `adler --doctor --only
<site>` reproduces the failure; `adler --doctor --fix --only <site>` diffs
present/absent responses and proposes a corrected signature. Paste it into a
local override or open a PR.

**Is it legal to use sock-puppet accounts for `--sessions`?** Adler ships
nothing here — you bring the session. Whether your engagement authorises
operating under a pseudonymous account against a site's ToS is an operator
decision; see [*Ethics & responsible use*](#ethics--responsible-use) for our
line.

## Ethics & responsible use

Adler aggregates publicly reachable profile URLs, but aggregation makes
intrusion easy — please use it responsibly.

**Intended uses:** checking your own accounts; authorized penetration tests
and bug-bounty engagements; security research; and OSINT investigations with
a lawful basis. **Do not** use Adler to stalk, harass, dox, or surveil
people without authorization, or to mass-target individuals.

**Detect, never circumvent.** Adler reports anti-bot responses (rate limits,
Cloudflare challenges, captchas) as `Uncertain` — it does not solve captchas
or bypass access controls. It rate-limits per host, supports `--max-rps` and
`--respect-robots`, and writes an optional `--audit-log` of every request.
See [SECURITY.md](SECURITY.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md).

## License

The Adler **code** is licensed under the [MIT License](LICENSE).

The default site registry (`adler-core/data/sites.json`) is also under MIT
— it is derived from the Sherlock project (MIT) and the Maigret project
(MIT). See the file's `_comment` header and the corresponding importer
scripts in `scripts/` for attribution.

The supplementary registry (`adler-core/data/sites_wmn.json`, included
by default; opt-out with `adler --no-wmn`) is derived from WhatsMyName
and licensed [CC BY-SA 4.0](LICENSE-CC-BY-SA-4.0). Adler's MIT licence
does not cover this file; downstream redistribution must preserve
attribution and the `ShareAlike` obligation on derivative data.