Spider Cloud Rust SDK
The Spider Cloud Rust SDK offers a toolkit for straightforward website scraping, crawling at scale, and other utilities like extracting links and taking screenshots, enabling you to collect data formatted for compatibility with language models (LLMs). It features a user-friendly interface for seamless integration with the Spider Cloud API.
-- Current WIP
Installation
To use the Spider Cloud Rust SDK, include the following in your Cargo.toml:
[]
= "0.1"
Usage
- Get an API key from spider.cloud
- Set the API key as an environment variable named
SPIDER_API_KEYor pass it as an argument when creating an instance of theSpiderstruct.
Here's an example of how to use the SDK:
use json;
use env;
async
Scraping a URL
To scrape data from a single URL:
let url = "https://example.com";
let scraped_data = spider.scrape_url.await.expect;
Crawling a Website
To automate crawling a website:
let url = "https://example.com";
let crawl_params = RequestParams ;
let crawl_result = spider.crawl_url.await.expect;
Crawl Streaming
Stream crawl the website in chunks to scale with a callback:
let url = "https://example.com";
let crawl_params = RequestParams ;
spider.crawl_url.await.expect;
Search
Perform a search for websites to crawl or gather search results:
let query = "a sports website";
let crawl_params = RequestParams ;
let crawl_result = spider.search.await.expect;
Retrieving Links from a URL(s)
Extract all links from a specified URL:
let url = "https://example.com";
let links = spider.links.await.expect;
Transform
Transform HTML to markdown or text lightning fast:
let data = vec!;
let params = RequestParams ;
let result = spider.transform.await.expect;
println!;
Taking Screenshots of a URL(s)
Capture a screenshot of a given URL:
let url = "https://example.com";
let screenshot = spider.screenshot.await.expect;
Extracting Contact Information
Extract contact details from a specified URL:
let url = "https://example.com";
let contacts = spider.extract_contacts.await.expect;
println!;
Labeling Data from a URL(s)
Label the data extracted from a particular URL:
let url = "https://example.com";
let labeled_data = spider.label.await.expect;
println!;
Checking Crawl State
You can check the crawl state of a specific URL:
let url = "https://example.com";
let state = spider.get_crawl_state.await.expect;
println!;
Downloading Files
You can download the results of the website:
let url = "https://example.com";
let options = hashmap!;
let response = spider.create_signed_url.await.expect;
println!;
Checking Available Credits
You can check the remaining credits on your account:
let credits = spider.get_credits.await.expect;
println!;
Data Operations
The Spider client can now interact with specific data tables to create, retrieve, and delete data.
Retrieve Data from a Table
To fetch data from a specified table by applying query parameters:
let table_name = "pages";
let query_params = RequestParams ;
let response = spider.data_get.await.expect;
println!;
Delete Data from a Table
To delete data from a specified table based on certain conditions:
let table_name = "websites";
let delete_params = RequestParams ;
let response = spider.data_delete.await.expect;
println!;
Streaming
If you need to use streaming, set the stream parameter to true and provide a callback function:
let url = "https://example.com";
let crawler_params = RequestParams ;
spider.links.await.expect;
Content-Type
The following Content-type headers are supported using the content_type parameter:
application/jsontext/csvapplication/xmlapplication/jsonl
let url = "https://example.com";
let crawler_params = RequestParams ;
// Stream JSON lines back to the client
spider.crawl_url.await.expect;
Error Handling
The SDK handles errors returned by the Spider Cloud API and raises appropriate exceptions. If an error occurs during a request, it will be propagated to the caller with a descriptive error message.
Contributing
Contributions to the Spider Cloud Rust SDK are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
License
The Spider Cloud Rust SDK is open-source and released under the MIT License.