Expand description
Structured search-provider capture (issue #130).
Turns a query + provider into a normalized, machine-readable result set so
that browser, CLI, and server callers all consume one consistent contract
instead of each reimplementing provider-specific scraping. Server-side and
CLI callers fetch provider pages directly (no CORS restriction), so this
module defaults to the fetch capture mode. Providers that expose a native
CORS/JSON API (Wikipedia) are preferred; HTML search engines are parsed
best-effort and report CAPTCHA/blocking through diagnostics.
Normalized result shape (camelCase JSON):
{
"query": "...", "provider": "...", "captureMode": "fetch",
"capturedAt": "2026-05-18T20:30:00Z",
"results": [{ "rank": 1, "title": "...", "url": "...", "snippet": "..." }],
"diagnostics": { "status": 200, "blockedByCors": false,
"blockedByCaptcha": false, "sourceUrl": "..." }
}Structs§
- Search
Diagnostics - Structured diagnostics describing how the capture went.
- Search
Result - The full normalized search capture result.
- Search
Result Item - A single normalized search result.
Constants§
- DEFAULT_
LIMIT - Default number of results requested/returned.
- DEFAULT_
PROVIDER - Default provider when none is supplied.
- SEARCH_
PROVIDERS - Providers understood by the search contract.
Functions§
- build_
search_ url - Build the provider-native source URL for a query.
- format_
search_ as_ markdown - Render a normalized search result as Markdown.
- is_
supported_ provider - Returns true if
provideris one of the supported providers. - looks_
like_ captcha - Detect provider CAPTCHA / bot-block interstitials in an HTML body.
- parse_
search_ results - Parse a provider response body into normalized result rows.
- search
- Capture structured search results for a query from a provider.