Crate knee_scraper

Source

Structs§

Comm
Command structure to hold options for executing the AI binary.
ScraperConfig

Functions§

ai
Asynchronously executes the AI command to solve a CAPTCHA.
ai_scrape
Recursively scrapes web pages starting from the given URL, handling CAPTCHA if encountered.
cap_solver
CAPTCHA solving function that can be used independently in any project.
check_open_directories
Checks for common open directories on the server.
download_media
Downloads a media file (image or video) and saves it to the local directory.
extract_domain
Extracts the domain from a URL for folder naming purposes.
extract_links
Extracts all links from an HTML page, normalizing them to absolute URLs.
fetch_robots_txt
Fetches and parses the robots.txt file.
fetch_with_cookies
Fetches a web page and prints the response status, demonstrating cookie handling.
normalize_link
Normalizes a link to an absolute URL based on the base URL.
random_delay
Sleeps for a random duration between a given range, mimicking human browsing behavior.
random_user_agent
Generates a random user-agent string from a predefined list.
rec_ai_scrape
Recursively scrapes web pages starting from the given URL, handling CAPTCHA when encountered. If the target phrase is not found in the HTML content of a page, it stops scraping in that direction.
rec_scrape
Recursively scrapes web pages starting from the given URL, looking for the target phrase. If the target phrase is not found in the HTML content of a page, it stops scraping in that direction.
recursive_scrape
Recursively scrapes web pages starting from the given URL.
run
Executes the entire scraping workflow for the provided URL, including:
scrape_content
Scrapes all meaningful content from an HTML page, including text, images, videos, meta tags, and forms.
scrape_for_emails
Scrapes for emails and saves them to a file.
scrape_for_errors
Scrapes for errors and stack traces in the HTML content.
scrape_js
Scrapes JavaScript content for API keys or tokens.
scrape_js_content
should_scrape_content
Checks if the given content contains the target phrase.