Crate knee_scraper Copy item path Source Comm Command structure to hold options for executing the AI binary. ScraperConfig ai Asynchronously executes the AI command to solve a CAPTCHA. ai_scrape Recursively scrapes web pages starting from the given URL, handling CAPTCHA if encountered. cap_solver CAPTCHA solving function that can be used independently in any project. check_open_directories Checks for common open directories on the server. download_media Downloads a media file (image or video) and saves it to the local directory. extract_domain Extracts the domain from a URL for folder naming purposes. extract_links Extracts all links from an HTML page, normalizing them to absolute URLs. fetch_robots_txt Fetches and parses the robots.txt file. fetch_with_cookies Fetches a web page and prints the response status, demonstrating cookie handling. normalize_link Normalizes a link to an absolute URL based on the base URL. random_delay Sleeps for a random duration between a given range, mimicking human browsing behavior. random_user_agent Generates a random user-agent string from a predefined list. rec_ai_scrape Recursively scrapes web pages starting from the given URL, handling CAPTCHA when encountered.
If the target phrase is not found in the HTML content of a page, it stops scraping in that direction. rec_scrape Recursively scrapes web pages starting from the given URL, looking for the target phrase.
If the target phrase is not found in the HTML content of a page, it stops scraping in that direction. recursive_scrape Recursively scrapes web pages starting from the given URL. run Executes the entire scraping workflow for the provided URL, including: scrape_content Scrapes all meaningful content from an HTML page, including text, images, videos, meta tags, and forms. scrape_for_emails Scrapes for emails and saves them to a file. scrape_for_errors Scrapes for errors and stack traces in the HTML content. scrape_js Scrapes JavaScript content for API keys or tokens. scrape_js_content should_scrape_content Checks if the given content contains the target phrase.