pub struct AppState {
pub config: Arc<AppConfig>,
pub renderer: Arc<FallbackRenderer>,
pub crawl_jobs: Arc<RwLock<HashMap<Uuid, CrawlJob>>>,
pub extract_jobs: Arc<RwLock<HashMap<Uuid, ExtractRecord>>>,
pub crawl_semaphore: Arc<Semaphore>,
pub searxng: Option<Arc<SearxngClient>>,
pub url_filter: Option<Arc<UrlFilterCfg>>,
}Expand description
Shared application state.
Fields§
§config: Arc<AppConfig>§renderer: Arc<FallbackRenderer>§crawl_jobs: Arc<RwLock<HashMap<Uuid, CrawlJob>>>§extract_jobs: Arc<RwLock<HashMap<Uuid, ExtractRecord>>>/v2/extract jobs. Separate from crawl_jobs because an extract result
is a single merged JSON object, not a Vec<ScrapeData>.
crawl_semaphore: Arc<Semaphore>§searxng: Option<Arc<SearxngClient>>SearXNG client. None when [search].searxng_url is unset, in which
case /v1/search returns a clear search_disabled error.
url_filter: Option<Arc<UrlFilterCfg>>Server-wide default /map URL filter. None disables the filter
entirely (legacy behaviour). Per-request overrides may swap or
extend this at handler time.
Implementations§
Source§impl AppState
impl AppState
pub fn new(config: AppConfig) -> CrwResult<Self>
Sourcepub async fn start_crawl_job(&self, req: CrawlRequest) -> Uuid
pub async fn start_crawl_job(&self, req: CrawlRequest) -> Uuid
Start a new crawl job and return its UUID. Spawns a background task that acquires the crawl semaphore before running.
Sourcepub async fn start_batch_job(
&self,
urls: Vec<String>,
template: ScrapeRequest,
) -> Uuid
pub async fn start_batch_job( &self, urls: Vec<String>, template: ScrapeRequest, ) -> Uuid
Start a /v2/batch/scrape job over an explicit URL list and return its
UUID. Reuses the crawl-job machinery (crawl_jobs + CrawlState) but
scrapes the given URLs directly — no link discovery, no same-origin
filtering, no dedup; input order is recoverable via metadata.sourceURL.
Sourcepub async fn start_extract_job(
&self,
urls: Vec<String>,
template: ScrapeRequest,
) -> Uuid
pub async fn start_extract_job( &self, urls: Vec<String>, template: ScrapeRequest, ) -> Uuid
Start a /v2/extract job. Scrapes each URL with formats:[json] + the
shared schema (already set on template) and merges the per-URL json
objects into one — matching the live API’s single-object data shape.