Skip to main content

adler_core/
lib.rs

1//! Core engine for the [Adler](https://github.com/commit3296/adler)
2//! OSINT username-search tool — runtime-agnostic, embed-friendly.
3//!
4//! The CLI lives in `adler-cli`; this crate is what you reach for to
5//! drive username detection from your own Rust code (a Discord bot
6//! that checks usernames, a security tool that flags exposed
7//! identities across a watchlist, a CI gate that asserts a name
8//! isn't claimed elsewhere, …).
9//!
10//! ## Quick start
11//!
12//! Scan the embedded ~439-site registry for one username and print
13//! the hits:
14//!
15//! ```no_run
16//! use adler_core::{Client, ExecutorOptions, MatchKind, Registry, Username, executor};
17//!
18//! # async fn run() -> adler_core::Result<()> {
19//! let registry = Registry::default_embedded()?;
20//!
21//! // filter(include, exclude, tags, exclude_tags, include_nsfw)
22//! // — empty slices = no name/tag filter; `false` keeps the
23//! // default NSFW auto-exclusion (matches Sherlock's `--nsfw`
24//! // opt-in). Pass `true` (or `&["nsfw".into()]` as tags) to
25//! // scan adult-content sites.
26//! let sites = registry.filter(&[], &[], &[], &[], false);
27//!
28//! let username = Username::new("torvalds")?;
29//! let client = Client::builder().build()?;
30//!
31//! let outcomes =
32//!     executor::run(&client, &sites, &username, ExecutorOptions::default()).await;
33//!
34//! for outcome in outcomes.iter().filter(|o| o.kind == MatchKind::Found) {
35//!     println!("{} → {}", outcome.site, outcome.url);
36//! }
37//! # Ok(())
38//! # }
39//! ```
40//!
41//! ## Map of the public API
42//!
43//! Detection plumbing:
44//!
45//! - [`Registry`] — loaded, validated collection of sites. Build from
46//!   the embedded [`default_embedded`](Registry::default_embedded),
47//!   from a JSON string ([`from_json_str`](Registry::from_json_str)),
48//!   or from disk ([`load_from_path`](Registry::load_from_path)).
49//! - [`Site`], [`Signal`], [`UrlTemplate`], [`Extractor`],
50//!   [`KnownPresent`] — site-registry value types. `Site` is
51//!   serde-(de)serialisable; the JSON Schema lives in `docs/sites.schema.json`.
52//! - [`Username`] — validated search target. Constructed via
53//!   [`Username::new`](Username::new); invalid characters / overlong
54//!   names are rejected at construction time.
55//! - [`Client`], [`ClientBuilder`] — `reqwest`-backed probe issuer.
56//!   Knobs the builder exposes: timeout, redirect limit, per-host /
57//!   global throttle, retry policy, user-agent rotation pool, proxy,
58//!   `robots.txt` cache, browser backend, browser budget.
59//! - [`CheckOutcome`], [`MatchKind`], [`UncertainReason`] — verdict
60//!   types. The signal pipeline is *negative-priority*: any
61//!   `NotFound` vote wins over `Found`; no votes → `Uncertain`. A
62//!   per-site `regex_check` mismatch short-circuits with
63//!   [`UncertainReason::UsernameNotAllowed`] before any HTTP request.
64//! - [`executor`] — bounded-concurrency fan-out runner. Pass an
65//!   [`ExecutorOptions`] to control concurrency, deadline, and
66//!   progress callback.
67//!
68//! Optional analysis:
69//!
70//! - [`correlate`] — group accounts that look like the same person
71//!   across sites via [`enriched`](crate::correlate::correlate)
72//!   profile fields.
73//! - [`permute`] — generate username variants
74//!   (alice → alice1, alice.dev, …) via [`MAX_VARIANTS`] /
75//!   [`PermuteLevel`].
76//! - [`doctor`] — registry health check
77//!   ([`check_site`](crate::doctor::check_site)), signature
78//!   derivation ([`suggest_fix`](crate::doctor::suggest_fix)),
79//!   known-present discovery
80//!   ([`discover_known_present`](crate::doctor::discover_known_present)),
81//!   site scaffolding ([`scaffold_site`](crate::doctor::scaffold_site)).
82//!
83//! Bot-protected sites (Instagram, X/Twitter today):
84//!
85//! - [`BrowserBackend`] trait — abstract real-Chrome driver.
86//!   Configurable on the [`Client`] via
87//!   [`ClientBuilder::browser`](ClientBuilder::browser). Built-in
88//!   implementations: [`browser::local::LocalBackend`] (free, via
89//!   `chromiumoxide`) and
90//!   [`browser::browserbase::BrowserbaseBackend`] (cloud, residential
91//!   IPs, in-tree raw async CDP client). [`BrowserBudget`] caps
92//!   browser-routed fetches per scan to keep cost predictable.
93//!
94//! ## Cache
95//!
96//! [`Cache`] persists per-(site, username, signal-signature) verdicts
97//! between runs. Compose with [`Client`] via the builder or skip
98//! entirely for one-shot scans.
99//!
100//! ## Error model
101//!
102//! [`Result`] is a `Result<T, Error>` alias; [`Error`] is a single
103//! crate-level `thiserror` enum. The probe path *never* surfaces
104//! errors — transient network failures become
105//! [`MatchKind::Uncertain`] with a typed [`UncertainReason`], so
106//! you get a partial result for every site even when the network is
107//! flaky. Loader errors (malformed registry JSON, invalid CSS
108//! selectors, regex compile failures) come back as `Err`.
109//!
110//! ## Version history
111//!
112//! Pre-1.0 `SemVer`. Breaking changes since 0.1:
113//!
114//! - **0.2.0** — added [`Site::request_headers`] (`BTreeMap<String,
115//!   String>`); [`BrowserBackend::fetch`] gained the `headers`
116//!   parameter; [`browser`] module became `pub`.
117//! - **0.3.0** — [`Site::known_present`] changed from
118//!   `Option<String>` to `Option<KnownPresent>` (the new enum
119//!   accepts string-or-array via untagged serde);
120//!   [`DoctorReport::Healthy::present`] and
121//!   `Unhealthy::present` changed from `Option<CheckOutcome>` to
122//!   `Vec<(String, CheckOutcome)>` (one entry per probed candidate).
123//! - **0.4.0** — [`Registry::filter`] gained a fifth
124//!   `include_nsfw: bool` parameter (default-exclude adult sites);
125//!   [`UncertainReason`] gained `UsernameNotAllowed`;
126//!   [`Site::regex_check`] field added (per-site username regex).
127//!
128//! Each change has a migration block in [the
129//! CHANGELOG](https://github.com/commit3296/adler/blob/main/CHANGELOG.md).
130
131mod ban;
132mod cache;
133mod check;
134mod client;
135mod correlate;
136pub mod doctor;
137mod enrich;
138mod error;
139pub mod executor;
140mod permute;
141mod registry;
142mod retry;
143mod robots;
144mod site;
145mod throttle;
146mod username;
147
148pub mod browser;
149
150pub use browser::{BrowserBackend, BrowserBudget, RenderedPage};
151pub use cache::Cache;
152pub use check::{CheckOutcome, MatchKind, UncertainReason};
153pub use client::{Client, ClientBuilder, DEFAULT_BROWSER_BUDGET, RawResponse};
154pub use correlate::{Cluster, CorrelationReport, LINK_THRESHOLD, correlate};
155pub use doctor::{DoctorReport, FixSuggestion};
156pub use error::{Error, Result};
157pub use executor::ExecutorOptions;
158pub use permute::{MAX_VARIANTS, PermuteLevel, permute};
159pub use registry::Registry;
160pub use site::{Engine, Extractor, KnownPresent, Signal, Site, UrlTemplate};
161pub use username::Username;