<p align="center">
<img src="banner.png" alt="Adler" />
</p>
<p align="center">
<a href="https://github.com/commit3296/adler/actions/workflows/ci.yml"><img src="https://github.com/commit3296/adler/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
<a href="https://crates.io/crates/adler-cli"><img src="https://img.shields.io/crates/v/adler-cli.svg" alt="crates.io"></a>
<a href="https://docs.rs/adler-core"><img src="https://docs.rs/adler-core/badge.svg" alt="docs.rs"></a>
<a href="https://adler-docs.pages.dev/"><img src="https://img.shields.io/badge/docs-adler--docs.pages.dev-orange?logo=cloudflarepages&logoColor=white" alt="Docs"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License: MIT"></a>
</p>
<p align="center">
<a href="https://github.com/commit3296/adler/actions/workflows/audit.yml"><img src="https://github.com/commit3296/adler/actions/workflows/audit.yml/badge.svg" alt="cargo-audit"></a>
<a href="https://github.com/commit3296/adler/actions/workflows/codeql.yml"><img src="https://github.com/commit3296/adler/actions/workflows/codeql.yml/badge.svg" alt="CodeQL"></a>
<a href="https://scorecard.dev/viewer/?uri=github.com/commit3296/adler"><img src="https://api.scorecard.dev/projects/github.com/commit3296/adler/badge" alt="OpenSSF Scorecard"></a>
<a href="https://www.bestpractices.dev/projects/13082"><img src="https://www.bestpractices.dev/projects/13082/badge" alt="OpenSSF Best Practices"></a>
</p>
# Adler
> *Named for Irene Adler — "the Woman", the one who outwitted Sherlock Holmes.
> Where Sherlock searched, Adler outsmarts.*
OSINT username search across ~2,600 bundled site entries, in Rust. Honest verdicts and built
to reach the hard ones — Cloudflare-walled, TLS-fingerprinted, geo-restricted,
login-walled.
## Documentation
- 📖 [**adler-docs**](https://adler-docs.pages.dev/) — the user
manual: install, the access engine, the web UI, embedding, FAQ.
- 🦀 [docs.rs/adler-core](https://docs.rs/adler-core) — Rust API reference.
- 🗺️ [PLAN.md](PLAN.md) — roadmap and the design behind the
access-engine phases.
This README covers the elevator pitch (compare table, detection rate
data) plus install / quickstart; deeper material lives on the docs
site so it can evolve independently of crate releases.
## How Adler compares
Open-source username-search tools that OSINT operators actually consider, on
the dimensions that matter when sites push back:
| Approx. sites | 400 | 3,000 | 600 | 5,400 | 2,600 |
| Verdict model | Found / NotFound | Found / NotFound | Found / NotFound | Found / NotFound | **Found / NotFound / Uncertain(reason)** |
| Bot-protected sites (Instagram, X, …) | — | — | — | — | **headless Chrome via `--browser-backend`** |
| TLS-fingerprint blocking | — | — | — | — | **Chrome 134 handshake via `--features impersonate`** |
| Proxy routing | one global | one global + Tor + I2P | — | — | one global **or** per-site policy via `--proxy-pool` |
| Cookies / sessions | — | global `cookies.txt` | — | — | **per-site named sessions** via `--sessions` |
| Registry self-heal | — | — | — | — | **`--doctor --fix` diffs responses, proposes new signatures** |
| Web UI | — | yes (results graph, reports) | — | — | `--web` — live SSE-streaming SolidJS SPA + JSON API |
| Output formats | text / CSV / XLSX / JSON | text / JSON / CSV / HTML / PDF / XMind / D3 | text / CSV / PDF | text / CSV / HTML | text / JSON / NDJSON / CSV / HTML |
| Embeddable library | — | yes (Python async) | — | — | `adler-core` on crates.io (Rust) |
| Runtime / packaging | Python | Python | Python | Python | **Rust — single static binary, `cargo binstall`** |
[cmp-s]: https://github.com/sherlock-project/sherlock
[cmp-m]: https://github.com/soxoj/maigret
[cmp-b]: https://github.com/p1ngul1n0/blackbird
[cmp-sn]: https://github.com/snooppr/snoop
**Adler's thesis: honest verdicts plus access for the sites that matter.** A
`NotFound` from a Python-HTTP-only tool on a Cloudflare-walled, TLS-
fingerprinted, geo-restricted, or login-walled site is often just "I gave up
at the first wall." Adler reports `Uncertain(reason)` when it couldn't verify,
and ships the transports you need to break the wall yourself — headless
browser, Chrome handshake emulation, per-site geo / IP-type egress, operator-
supplied sessions. We do not solve CAPTCHAs or evade human-verification (see
[*Ethics & responsible use*](#ethics--responsible-use)).
## Evidence, confidence, and reports
Recent Adler builds carry more than a Found/NotFound verdict. JSON,
Web/API, MCP, and investigation-report surfaces can expose the detection
evidence behind a result, normalized profile evidence such as display
names or external links, confidence scores with machine-readable
reasons, and deterministic identity-cluster candidates when multiple
found profiles share strong structured evidence.
Use `adler --report-scan <SCAN_ID>` for a Markdown investigation report,
`adler --report-scan <SCAN_ID> --report-format json` when another tool
should consume the report model directly, or
`adler --report-scan <SCAN_ID> --report-format html > report.html` for a
self-contained local case file. These reports are derived from local scan
artifacts; see [Privacy and retention](docs/privacy-retention.md) for what
gets stored and how to handle scan ids, evidence, clusters, and report files.
## Detection rate
Recall depends on where you scan from. The last apples-to-apples published
measurement is from a `--doctor` pass on 2026-05-26 against the v0.3.x
registry (411 sites):
| Datacenter IP (Hetzner / Leaseweb DE) | 282 / 411 | 68.6% |
| US residential proxy pool (DECODO) | **305 / 411** | **74.2%** |
The residential lift is real: ~40 sites swap their verdict between
`Uncertain` (datacenter) and `Found` (residential) — most are
Cloudflare-walled or geo-restricted (RU-segment, plus platforms like
Reddit, Imgur, Patreon). The remaining ~26% breaks down roughly as:
- **Bot-protected sites** tagged `bot-protected` (Instagram and
X/Twitter today) — these serve a JS login wall to a plain HTTP
request; a clean IP doesn't help, you need a browser backend.
Exclude them with `--exclude-tag bot-protected`.
- **Stale Sherlock-imported `known_present` accounts** that no
longer exist on the live site. The `--doctor --suggest-known-present`
tool (new in v0.4.0) probes a small candidate pool (the site's
brand name, plus `torvalds` / `octocat` / `admin` / …) and prints
a paste-ready snippet for any site where it finds a live account.
Discovery surfaced 19 healable entries on the most recent sweep;
the remaining placeholders need either a contributor-found
candidate or a deeper repair via `--doctor --fix`.
- **Sites whose detection rule fires for *every* username** —
signal repair territory, not username repair. `--doctor --fix`
diffs the responses and proposes a tighter signal.
- **Sites that don't reliably distinguish found from not-found** for
unauthenticated requests at all — investigated rather than shipped
as false-positive entries: Reddit now requires an opt-in OAuth
session, Pinterest uses its public oEmbed endpoint, while TikTok and
Threads remain parked. See issues
[#11–#14](https://github.com/commit3296/adler/issues?q=is%3Aissue+label%3A%22help+wanted%22)
for the specific failure modes and what would unblock each.
Run the same check yourself: `adler --doctor` (uses your current IP)
or `adler --doctor --proxy <url>` (via your own proxy). With
`--browser-backend browserbase` the doctor's `--fix` mode routes
bot-protected sites through a real Chrome session, so the diff sees
real profile pages rather than two identical login walls. With
`--suggest-known-present` you get an OVERRIDES block per healable
site.
## Crates
| `adler-core` | lib | Detection engine, site registry, executor. |
| `adler-server`| lib | HTTP API + SSE streaming + scan persistence; embeds the SolidJS web UI via `rust-embed`. |
| `adler-mcp` | lib | Model Context Protocol server (`rmcp 1.7`); exposes the OSINT surface to AI agents over stdio + Streamable HTTP+SSE. |
| `adler-cli` | bin | `adler` command-line interface; `--web` launches the embedded server + UI in-process; `--mcp` / `--mcp-http` launch the MCP server. |
## Install
From crates.io (compiles locally, ~1–2 min):
```bash
cargo install adler-cli
```
Pre-built binary from the GitHub release (instant, no compile):
```bash
cargo binstall adler-cli # https://github.com/cargo-bins/cargo-binstall
```
From source:
```bash
git clone https://github.com/commit3296/adler.git
cd adler
cargo install --path adler-cli
```
Requires Rust ≥ 1.85. The installed binary is `adler`. The library
([`adler-core`](https://crates.io/crates/adler-core)) is published separately
for embedding the engine in your own tools — see the
[*Library*](#library) section below.
### Verify release artifacts
Every platform archive attached to a GitHub Release is signed with
[Sigstore cosign](https://github.com/sigstore/cosign) using the GitHub
Actions OIDC identity — no long-lived keys are kept, the signing
certificate is short-lived and bound to the exact workflow that
produced it (visible in the [Rekor](https://search.sigstore.dev/)
transparency log). The signature (`.sig`) and certificate (`.pem`) are
uploaded alongside each archive on the release page.
```bash
TAG=v0.11.3 # or whichever release
ARCHIVE=adler-x86_64-unknown-linux-gnu.tar.gz
# Pull the archive + its signature + certificate from the release.
gh release download "$TAG" --repo commit3296/adler \
--pattern "$ARCHIVE" --pattern "$ARCHIVE.sig" --pattern "$ARCHIVE.pem"
# Verify the signature is bound to this repo's release.yml workflow.
cosign verify-blob \
--certificate "$ARCHIVE.pem" \
--signature "$ARCHIVE.sig" \
--certificate-identity-regexp '^https://github\.com/commit3296/adler/\.github/workflows/release\.yml@refs/tags/v[0-9]+\.[0-9]+\.[0-9]+' \
--certificate-oidc-issuer 'https://token.actions.githubusercontent.com' \
"$ARCHIVE"
```
A successful verification prints `Verified OK`. The identity-regex
pins the signer to *this* repository's `release.yml` at a SemVer tag —
a forged archive uploaded under a different workflow won't satisfy it.
## Build & run
```bash
cargo build --workspace
cargo run -p adler-cli -- alice
```
Logging is controlled by the `ADLER_LOG` env var (defaults to `adler=info`):
```bash
ADLER_LOG=adler=debug cargo run -p adler-cli -- alice
```
## Usage
`adler <username>` scans the embedded registry; everything else is a
knob. Text output shows Found and Uncertain by default and hides
NotFound — pass `--all` for the full list. Results stream into a
terminal as they resolve; piped output is collected and ordered. Exit
codes: `0` found, `1` nothing found, `2` error.
A few of the most common knobs:
```bash
adler --tag dev,social alice # filter by tags
adler --top 50 alice # popularity rank <= 50
adler --format ndjson alice # one JSON object per line
adler --proxy socks5://host:1080 alice # single proxy for everything
adler --browser-backend local alice # bot-protected sites via Chrome
adler --input users.txt # batch many usernames
adler --watch alice # diff vs last run
```
→ Complete flag reference, grouped by intent (filtering / output /
network & sessions / browser & cache / batch & enrichment), is on the
[**Usage**](https://adler-docs.pages.dev/usage/) page.
`adler --help` lists every flag with its short doc; the docs page adds
the bigger picture.
### Filter contract
All scan surfaces use the same registry filter semantics. The names
differ slightly because CLI flags are kebab-case, HTTP JSON is
snake_case, and MCP exposes agent-friendly JSON arguments:
| Include site names containing any term, case-insensitive | `--only NAME` | `only: string[]` | `only: string[]` |
| Exclude site names containing any term, case-insensitive | `--exclude NAME` | `exclude: string[]` | `exclude: string[]` |
| Include sites carrying any tag, case-insensitive exact match | `--tag TAG` | `tag: string[]` | `tag: string[]` |
| Exclude sites carrying any tag | `--exclude-tag TAG` | `exclude_tag: string[]` | `exclude_tag: string[]` |
| Include adult-content sites | `--nsfw` | `nsfw: boolean` | `include_nsfw: boolean` |
| Keep ranked sites with `popularity <= N`; drop unranked sites | `--top N` | `top: number` | `top: number` |
| Restrict a web/server scan to named egresses from `--proxy-pool` | n/a | `egress_names: string[]` | n/a |
Empty arrays / omitted values mean "no filter" for that dimension.
Disabled sites are never scanned. `nsfw` sites are hidden by default,
but asking for `--tag nsfw` / `tag: ["nsfw"]` is also an explicit
opt-in. Multiple include dimensions compose with AND; within a single
dimension, terms compose with OR.
## Web UI
`adler --web` boots a small in-process HTTP server and serves a SolidJS
SPA from the same binary — live SSE-streamed scans, persisted history,
side-by-side diff against an earlier run with a picker for *which*
historical scan to diff against, a read-only access-engine panel,
per-scan egress subset selection when a `--proxy-pool` is loaded, and
a single/batch tab pair so you can paste a list of usernames into the
hero and watch them queue through one at a time. The server persistence
layer also builds timeline events for first-seen, disappeared,
reappeared, and evidence-changed profiles through
`/api/scans/timeline/{username}`; UI/MCP presentation can land in later
slices.
```bash
adler --web # http://127.0.0.1:8080
adler --web --web-bind 0.0.0.0:9000 # listen on all interfaces, custom port
```
> **Warning** — the default bind is loopback. Switching to `0.0.0.0`
> exposes the JSON API to your network. Adler is not built to face the
> open internet; put auth in front of any non-loopback bind.
→ The
[**Web UI**](https://adler-docs.pages.dev/web-ui/) page
covers the full feature set, the `/api/*` surface, and the deployment
notes (the SPA is `rust-embed`'d into the binary; rebuild from source
with `npm ci && npm run build` in `adler-server/web/`).
## MCP server
Adler exposes its OSINT surface to AI assistants over the
[Model Context Protocol](https://modelcontextprotocol.io/). Seven
**tools** the agent can call (`list_sites`, `scan_username` with
streamed progress, `scan_batch`, `doctor_check`, `get_scan_history`,
`diff_scans`, `get_investigation_report`),
nine **resources** it can browse (`adler://registry/{sites,tags,
disabled}`, `adler://scans/recent`, `adler://watchlists/default`,
`adler://scans/{id}` / `adler://scans/{from}/diff/{to}` /
`adler://timelines/{username}` / `adler://reports/{id}` templates),
and three **prompts** with templated OSINT workflows
(`investigate_username`, `audit_registry_health`,
`correlate_accounts`). Two transports — pick whichever fits how the
agent runs.
```bash
adler --mcp # stdio: Claude Desktop / Cursor / local agents
adler --mcp-http 127.0.0.1:8766 # HTTP+SSE: remote agents, mounted at /mcp
```
The HTTP transport inherits `rmcp`'s loopback `allowed_hosts`
DNS-rebind guard out of the box; non-loopback binds expose the API
without authentication, so only do it on a trusted network. The
`top` filter on MCP scan tools is a popularity-rank ceiling
(`popularity <= top`), not "return at most N results"; unranked sites
are excluded when it is set. The
`instructions` block sent on `initialize` restates the project's
ethical bound (authorised security testing / OSINT research /
defensive work only; no harassment, doxxing, or unauthorised
surveillance) so the agent's first peek at the server names what's
in scope.
→ The [**Usage**](https://adler-docs.pages.dev/usage/#mcp-server)
page lists every tool / resource / prompt with its arguments and
return shape. `adler-mcp/examples/` ships two hand-runnable probes
(stdio + HTTP) that double as reference implementations of a
minimal MCP client.
## Privacy and retention
Adler stores local artifacts only for surfaces that need reuse:
one-shot scans may use the result cache, `--watch` keeps a previous
found-account snapshot, and `adler --web` writes finished scan JSON
under `$XDG_CACHE_HOME/adler/scans/` (falling back to
`$HOME/.cache/adler/scans/`). MCP history, diff, timeline, and report
surfaces read that same web history directory but do not create scan
history by themselves.
Use `--no-cache` for one-shot scans that should not touch the result
cache, `adler --cache-clear` to remove the cache file, a temporary
`XDG_CACHE_HOME` for throwaway Web/MCP sessions, and `--scans-dir` when
history needs an explicit location. Non-loopback `--web-bind` and
`--mcp-http` expose history and scan APIs without Adler-provided
authentication; put your own access control in front of them.
→ Details live in [docs/privacy-retention.md](docs/privacy-retention.md).
## Access engine
Adler ships a transport ladder for sites a plain HTTP client can't see —
that's the whole reason it scores ahead of Sherlock / Maigret on the
hard subset of the registry:
- **Browser backend** (`--browser-backend local` / `browserbase`) — real
headless Chrome for sites tagged `bot-protected` (Instagram, X /
Twitter today). Bounded by `--browser-budget` so a misconfigured flag
can't burn a quota.
- **TLS-fingerprint impersonation** (`cargo install --features
impersonate`) — in-process Chrome 134 BoringSSL handshake for sites
tagged `protection: tls-fingerprint`. Much cheaper than a real
browser.
- **Egress pool** (`--proxy-pool <file>`) — per-site geo / IP-type
routing. Sites with an `access` policy pick a matching proxy; sites
without stay on the default egress. `region:XX` tags
auto-populate a *soft* `prefer_geo` (since v0.12) so 685 region-
tagged sites get a recall lift when a matching egress is configured,
and fall back to the default when one isn't — no hard
`Uncertain(GeoUnavailable)`.
- **Sessions** (`--sessions <file>`) — operator-supplied cookies /
tokens for login-walled sites. Per-site `[name]` tables; values
redacted from logs. Reddit is session-gated: either provide
`[reddit] Authorization = "Bearer <token>"` manually, or set
`REDDIT_CLIENT_ID` / `REDDIT_CLIENT_SECRET` and Adler will fetch an
app-only OAuth bearer token for the `reddit` session. Without one of
those opt-in credentials Adler reports `session_required` instead of
probing anonymously.
- **Automatic escalation** (`--escalation-budget N` / `--no-escalation`)
— when the cheap path returns `Uncertain(cloudflare_challenge |
rate_limited)`, the router automatically retries through the browser
backend. Bounded by its own budget. Outcomes carry `transport` and
`escalations` telemetry so it's clear which path produced each
verdict. `adler --doctor --suggest-protection` (since v0.13) reads
that telemetry across runs and flags sites that consistently
escalate as candidates for adding `protection: cloudflare` up front.
→ Full guide with the TOML formats, guardrails, and trade-offs lives at
[**Access engine**](https://adler-docs.pages.dev/access-engine/).
## Library
`adler-core` is the runtime-agnostic engine that powers the CLI,
published separately on [crates.io](https://crates.io/crates/adler-core)
for embedding in your own Rust tools — a Discord bot that checks
usernames, a security tool that flags exposed identities across a
watchlist, a CI gate that asserts a name isn't claimed elsewhere.
The core crate also exposes `WatchlistConfig` for modelling watched
usernames, aliases, and site/tag scopes without coupling that model to
CLI loops, web state, or a scheduler. Add `ScanSchedule` when a caller
needs a repeated-scan policy while keeping timers and task queues in its
own runtime.
```toml
[dependencies]
adler-core = "0.10"
tokio = { version = "1", features = ["macros", "rt-multi-thread"] }
```
→ Minimal worked example, the notable `ClientBuilder` knobs, and the
per-version breaking-change log are on the
[**Embedding**](https://adler-docs.pages.dev/embedding/)
page. The complete API reference is on
[docs.rs/adler-core](https://docs.rs/adler-core).
## Site registry
The default registry (`adler-core/data/sites.json`, ~2.5k sites) is
generated from MIT-licensed upstream data — Sherlock + Maigret + an
opt-in WhatsMyName tranche (CC BY-SA 4.0; pass `--no-wmn` to drop it
when redistributing scan output under MIT only). Detections are imported
**unverified** — `adler --doctor` validates every signal, `--doctor
--fix` proposes corrected ones, and `--doctor --fix --apply --sites
<path>` (since v0.12) patches them straight into the JSON file with
an atomic sibling-`*.tmp` rewrite.
→ Detailed lineage, schema, signal model, and doctor workflow live in
[**Site registry**](https://adler-docs.pages.dev/site-registry/).
## Troubleshooting
Common questions ("Why is everything Uncertain?", "Why does Adler find
fewer accounts than Sherlock?", "How do I scan Instagram?", …) are
covered in the [**FAQ**](https://adler-docs.pages.dev/faq/) on
the docs site.
For CI / contributor-facing commands (`cargo fmt`, `cargo clippy`,
`cargo test`), see [CONTRIBUTING.md](CONTRIBUTING.md).
## Ethics & responsible use
Adler aggregates publicly reachable profile URLs, but aggregation makes
intrusion easy — please use it responsibly.
**Intended uses:** checking your own accounts; authorized penetration tests
and bug-bounty engagements; security research; and OSINT investigations with
a lawful basis. **Do not** use Adler to stalk, harass, dox, or surveil
people without authorization, or to mass-target individuals.
**Detect, never circumvent.** Adler reports anti-bot responses (rate limits,
Cloudflare challenges, captchas) as `Uncertain` — it does not solve captchas
or bypass access controls. It rate-limits per host, supports `--max-rps` and
`--respect-robots`, and writes an optional `--audit-log` of every request.
See [SECURITY.md](SECURITY.md) and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md).
## License
The Adler **code** is licensed under the [MIT License](LICENSE).
The default site registry (`adler-core/data/sites.json`) is also under MIT
— it is derived from the Sherlock project (MIT) and the Maigret project
(MIT). See the file's `_comment` header and the corresponding importer
scripts in `scripts/` for attribution.
The supplementary registry (`adler-core/data/sites_wmn.json`, included
by default; opt-out with `adler --no-wmn`) is derived from WhatsMyName
and licensed [CC BY-SA 4.0](LICENSE-CC-BY-SA-4.0). Adler's MIT licence
does not cover this file; downstream redistribution must preserve
attribution and the `ShareAlike` obligation on derivative data.