Skip to main content

Module search

Module search 

Source
Expand description

Structured search-provider capture (issue #130).

Turns a query + provider into a normalized, machine-readable result set so that browser, CLI, and server callers all consume one consistent contract instead of each reimplementing provider-specific scraping. Server-side and CLI callers fetch provider pages directly (no CORS restriction), so this module defaults to the fetch capture mode. Providers that expose a native CORS/JSON API (Wikipedia) are preferred; HTML search engines are parsed best-effort and report CAPTCHA/blocking through diagnostics.

Normalized result shape (camelCase JSON):

{
  "query": "...", "provider": "...", "captureMode": "fetch",
  "capturedAt": "2026-05-18T20:30:00Z",
  "results": [{ "rank": 1, "title": "...", "url": "...", "snippet": "..." }],
  "diagnostics": { "status": 200, "blockedByCors": false,
                   "blockedByCaptcha": false, "sourceUrl": "..." }
}

Structs§

SearchDiagnostics
Structured diagnostics describing how the capture went.
SearchResult
The full normalized search capture result.
SearchResultItem
A single normalized search result.

Constants§

DEFAULT_LIMIT
Default number of results requested/returned.
DEFAULT_PROVIDER
Default provider when none is supplied.
SEARCH_PROVIDERS
Providers understood by the search contract.

Functions§

build_search_url
Build the provider-native source URL for a query.
format_search_as_markdown
Render a normalized search result as Markdown.
is_supported_provider
Returns true if provider is one of the supported providers.
looks_like_captcha
Detect provider CAPTCHA / bot-block interstitials in an HTML body.
parse_search_results
Parse a provider response body into normalized result rows.
search
Capture structured search results for a query from a provider.