Module utils

Source
Expand description

Application utils.

Modules§

abs
Absolute path domain handling.
connect
Connect layer for reqwest.
css_selectors
Generic CSS selectors.
detect_system
CPU and Memory detection to balance limitations.
header_utils
Utils to modify the HTTP header.
interner
String interner.
trie
A trie struct.

Structs§

APACHE_FORBIDDEN
Apache server forbidden.
AllowedDomainTypes
Allow subdomains or tlds.
HttpResponse
A basic generic type that represents an HTTP response.
OPEN_RESTY_FORBIDDEN
Open Resty forbidden.
PageResponse
The response of a web page.

Enums§

HeaderSource
Accepts different header types (for flexibility).
HttpVersion
Represents an HTTP version

Statics§

IGNORE_CONTENT_TYPES
Ignore the content types.

Functions§

clean_html
Clean the html removing css and js
clean_html_raw
Clean the html removing css and js default using the scraper crate.
clean_html_slim
Clean and remove all base64 images from the prompt.
convert_headers
Convert headers to header map
crawl_duration_expired
Check if the crawl duration is expired.
detect_anti_bot_from_body
Detect the anti-bot technology.
detect_anti_bot_from_headers
Detect from headers.
detect_anti_bot_tech_response
Detect the anti-bot used from the request.
detect_antibot_from_url
Detect antibot from url
detect_hard_forbidden_content
Detect if a page is forbidden and should not retry.
emit_log
Emit a log info event.
emit_log_shutdown
Emit a log info event.
fetch_page_html
Perform a network request to a resource extracting all content as text streaming.
fetch_page_html_raw
Perform a network request to a resource extracting all content streaming.
fetch_page_html_raw_only_html
Perform a network request to a resource extracting all content streaming.
get_cookies
The response cookies mapped. This does nothing without the cookies feature flag enabled.
get_last_segment
Get the last segment path.
get_semaphore
Return the semaphore that should be used.
handle_openai_credits
Handle the OpenAI credits used. This does nothing without ‘openai’ feature flag.
handle_response_bytes
Handle the response bytes
handle_response_bytes_writer
Handle the response bytes writing links while crawling
is_html_content_check
Check if the content is HTML.
log
Log to console if configuration verbose.
networking_capable
Determine if networking is capable for a URL.
openai_request
Perform a request to OpenAI Chat. This does nothing without the ‘openai’ flag enabled.
prepare_url
Prepare the url for parsing if it fails. Use this method if the url does not start with http or https.
put_hybrid_cache
Store the page to cache to be re-used across HTTP request.