adler_core/lib.rs
1//! Core engine for the [Adler](https://github.com/commit3296/adler)
2//! OSINT username-search tool — runtime-agnostic, embed-friendly.
3//!
4//! The CLI lives in `adler-cli`; this crate is what you reach for to
5//! drive username detection from your own Rust code (a Discord bot
6//! that checks usernames, a security tool that flags exposed
7//! identities across a watchlist, a CI gate that asserts a name
8//! isn't claimed elsewhere, …).
9//!
10//! ## Quick start
11//!
12//! Scan the embedded ~439-site registry for one username and print
13//! the hits:
14//!
15//! ```no_run
16//! use adler_core::{Client, ExecutorOptions, MatchKind, Registry, Username, executor};
17//!
18//! # async fn run() -> adler_core::Result<()> {
19//! let registry = Registry::default_embedded()?;
20//!
21//! // filter(include, exclude, tags, exclude_tags, include_nsfw)
22//! // — empty slices = no name/tag filter; `false` keeps the
23//! // default NSFW auto-exclusion (matches Sherlock's `--nsfw`
24//! // opt-in). Pass `true` (or `&["nsfw".into()]` as tags) to
25//! // scan adult-content sites.
26//! let sites = registry.filter(&[], &[], &[], &[], false);
27//!
28//! let username = Username::new("torvalds")?;
29//! let client = Client::builder().build()?;
30//!
31//! let outcomes =
32//! executor::run(&client, &sites, &username, ExecutorOptions::default()).await;
33//!
34//! for outcome in outcomes.iter().filter(|o| o.kind == MatchKind::Found) {
35//! println!("{} → {}", outcome.site, outcome.url);
36//! }
37//! # Ok(())
38//! # }
39//! ```
40//!
41//! ## Map of the public API
42//!
43//! Detection plumbing:
44//!
45//! - [`Registry`] — loaded, validated collection of sites. Build from
46//! the embedded [`default_embedded`](Registry::default_embedded),
47//! from a JSON string ([`from_json_str`](Registry::from_json_str)),
48//! or from disk ([`load_from_path`](Registry::load_from_path)).
49//! - [`Site`], [`Signal`], [`UrlTemplate`], [`Extractor`],
50//! [`KnownPresent`] — site-registry value types. `Site` is
51//! serde-(de)serialisable; the JSON Schema lives in `docs/sites.schema.json`.
52//! - [`Username`] — validated search target. Constructed via
53//! [`Username::new`](Username::new); invalid characters / overlong
54//! names are rejected at construction time.
55//! - [`Client`], [`ClientBuilder`] — `reqwest`-backed probe issuer.
56//! Knobs the builder exposes: timeout, redirect limit, per-host /
57//! global throttle, retry policy, user-agent rotation pool, proxy,
58//! `robots.txt` cache, browser backend, browser budget.
59//! - [`CheckOutcome`], [`MatchKind`], [`UncertainReason`] — verdict
60//! types. The signal pipeline is *negative-priority*: any
61//! `NotFound` vote wins over `Found`; no votes → `Uncertain`. A
62//! per-site `regex_check` mismatch short-circuits with
63//! [`UncertainReason::UsernameNotAllowed`] before any HTTP request.
64//! - [`executor`] — bounded-concurrency fan-out runner. Pass an
65//! [`ExecutorOptions`] to control concurrency, deadline, and
66//! progress callback.
67//!
68//! Optional analysis:
69//!
70//! - [`correlate`] — group accounts that look like the same person
71//! across sites via [`enriched`](crate::correlate::correlate)
72//! profile fields.
73//! - [`permute`] — generate username variants
74//! (alice → alice1, alice.dev, …) via [`MAX_VARIANTS`] /
75//! [`PermuteLevel`].
76//! - [`doctor`] — registry health check
77//! ([`check_site`](crate::doctor::check_site)), signature
78//! derivation ([`suggest_fix`](crate::doctor::suggest_fix)),
79//! known-present discovery
80//! ([`discover_known_present`](crate::doctor::discover_known_present)),
81//! site scaffolding ([`scaffold_site`](crate::doctor::scaffold_site)).
82//!
83//! Bot-protected sites (Instagram, X/Twitter today):
84//!
85//! - [`BrowserBackend`] trait — abstract real-Chrome driver.
86//! Configurable on the [`Client`] via
87//! [`ClientBuilder::browser`](ClientBuilder::browser). Built-in
88//! implementations: [`browser::local::LocalBackend`] (free, via
89//! `chromiumoxide`) and
90//! [`browser::browserbase::BrowserbaseBackend`] (cloud, residential
91//! IPs, in-tree raw async CDP client). [`BrowserBudget`] caps
92//! browser-routed fetches per scan to keep cost predictable.
93//!
94//! ## Cache
95//!
96//! [`Cache`] persists per-(site, username, signal-signature) verdicts
97//! between runs. Compose with [`Client`] via the builder or skip
98//! entirely for one-shot scans.
99//!
100//! ## Error model
101//!
102//! [`Result`] is a `Result<T, Error>` alias; [`Error`] is a single
103//! crate-level `thiserror` enum. The probe path *never* surfaces
104//! errors — transient network failures become
105//! [`MatchKind::Uncertain`] with a typed [`UncertainReason`], so
106//! you get a partial result for every site even when the network is
107//! flaky. Loader errors (malformed registry JSON, invalid CSS
108//! selectors, regex compile failures) come back as `Err`.
109//!
110//! ## Version history
111//!
112//! Pre-1.0 `SemVer`. Breaking changes since 0.1:
113//!
114//! - **0.2.0** — added [`Site::request_headers`] (`BTreeMap<String,
115//! String>`); [`BrowserBackend::fetch`] gained the `headers`
116//! parameter; [`browser`] module became `pub`.
117//! - **0.3.0** — [`Site::known_present`] changed from
118//! `Option<String>` to `Option<KnownPresent>` (the new enum
119//! accepts string-or-array via untagged serde);
120//! [`DoctorReport::Healthy::present`] and
121//! `Unhealthy::present` changed from `Option<CheckOutcome>` to
122//! `Vec<(String, CheckOutcome)>` (one entry per probed candidate).
123//! - **0.4.0** — [`Registry::filter`] gained a fifth
124//! `include_nsfw: bool` parameter (default-exclude adult sites);
125//! [`UncertainReason`] gained `UsernameNotAllowed`;
126//! [`Site::regex_check`] field added (per-site username regex).
127//!
128//! Each change has a migration block in [the
129//! CHANGELOG](https://github.com/commit3296/adler/blob/main/CHANGELOG.md).
130
131mod ban;
132mod cache;
133mod check;
134mod client;
135mod correlate;
136pub mod doctor;
137mod enrich;
138mod error;
139pub mod executor;
140mod permute;
141mod registry;
142mod retry;
143mod robots;
144mod site;
145mod throttle;
146mod username;
147
148pub mod browser;
149
150pub use browser::{BrowserBackend, BrowserBudget, RenderedPage};
151pub use cache::Cache;
152pub use check::{CheckOutcome, MatchKind, UncertainReason};
153pub use client::{Client, ClientBuilder, DEFAULT_BROWSER_BUDGET, RawResponse};
154pub use correlate::{Cluster, CorrelationReport, LINK_THRESHOLD, correlate};
155pub use doctor::{DoctorReport, FixSuggestion};
156pub use error::{Error, Result};
157pub use executor::ExecutorOptions;
158pub use permute::{MAX_VARIANTS, PermuteLevel, permute};
159pub use registry::Registry;
160pub use site::{Engine, Extractor, KnownPresent, Signal, Site, UrlTemplate};
161pub use username::Username;