ChadSelect
One query. Any format. Every selector.
ChadSelect is a Rust data extraction library that unifies Regex, XPath 1.0, CSS Selectors, and JMESPath behind a single, dead-simple query interface. Load your content, prefix your query string, get results. That's it.
Why ChadSelect?
Scraping and data extraction is messy. You're juggling regex for raw text, scraper for CSS, sxd_xpath for XPath, and jmespath for JSON — all with different APIs, different error handling, and different mental models.
ChadSelect collapses all of that into one struct and one query pattern:
use ChadSelect;
let mut cs = new;
cs.add_html;
cs.add_json;
// CSS selector
cs.select;
// XPath
cs.select;
// Regex (works on everything)
cs.select;
// JMESPath
cs.select;
No separate parsers to manage. No error handling boilerplate. Queries that fail return empty strings — never panics.
Features
- Unified query API —
regex:,xpath:,css:,json:prefixes route to the right engine - Multi-content — load multiple documents (HTML, JSON, plain text) and query across all of them
- Post-processing functions — pipe results through
normalize-space(),uppercase(),substring-after(), and more - CSS text pseudo-selectors —
:has-text(),:text-equals(),:text-starts(),:text-ends(),:contains-text() - Index selection — grab all results, the first, the Nth, or fallback through a priority list
- Lazy caching — parsed documents are cached on first query; subsequent queries reuse them
- Zero panics — every code path returns empty results on failure, never crashes
Installation
Add to your Cargo.toml:
[]
= "0.1.1"
Quick Start
use ChadSelect;
let mut cs = new;
// Load content
cs.add_html;
// Query it
let price = cs.select;
assert_eq!;
Query Prefixes
Every query string starts with a prefix that tells ChadSelect which engine to use:
| Prefix | Engine | Content Types | Example |
|---|---|---|---|
regex: |
Regex | All | regex:price:\s*\$(\d+) |
xpath: |
XPath 1.0 | HTML, Text | xpath://span[@class='vin']/text() |
css: |
CSS | HTML | css:div.product > .price |
json: |
JMESPath | JSON | json:store.inventory[0].name |
If no prefix is provided, the query defaults to regex.
API
Loading Content
let mut cs = new;
cs.add_html; // HTML content (works with CSS, XPath, Regex)
cs.add_json; // JSON content (works with JMESPath, Regex)
cs.add_text; // Plain text (works with Regex, XPath)
cs.content_count; // Number of loaded documents
cs.clear; // Remove all content
Querying
select(index, query) -> String
Returns a single result string. Empty string if nothing matches.
// First match
cs.select;
// Second match
cs.select;
query(index, query) -> Vec<String>
Returns all matches as a vector. Use index -1 for all results.
// All matches
let all = cs.query;
// Specific index
let first = cs.query;
select_first(queries) -> Vec<String>
Tries multiple queries in order, returns results from the first one that matches. Perfect for fallback chains.
let result = cs.select_first;
select_many(queries) -> Vec<String>
Runs multiple queries and combines all unique results.
let results = cs.select_many;
Custom Validators (_where variants)
By default, select, select_first, and select_many treat a result as valid when it is non-empty and non-whitespace. This is exposed as chadselect::default_valid.
Sometimes that's not enough — you might want to reject "0", require a minimum length, or validate that a result parses as a number in a certain range. The _where variants accept a closure Fn(&str) -> bool that defines what "valid" means for your use case.
select_where(index, query, valid) -> String
// Reject "0" as a price — fall back to empty string
let price = cs.select_where;
select_first_where(queries, valid) -> Vec<String>
Falls through queries until one produces results that all pass the validator.
// Skip queries that return "0", "N/A", or empty-ish values
let result = cs.select_first_where;
select_many_where(queries, valid) -> Vec<String>
Only includes results that pass the validator.
// Collect all prices, but only keep values > $10
let prices = cs.select_many_where;
Real-world examples
// Minimum length — reject short/garbage extractions
cs.select_where;
// Must be numeric
cs.select_where;
// Compose with the default validator + extra checks
cs.select_first_where;
Post-Processing Functions
Pipe query results through text functions using >>. Works with both CSS and XPath queries.
We use >> instead of | because | is the union operator in XPath 1.0 and a pipe in JMESPath. The >> delimiter is unambiguous across all selector engines.
css:.selector >> function1() >> function2()
xpath://path/text() >> function1() >> function2()
Available Functions
| Function | Description | Example |
|---|---|---|
normalize-space() |
Trim + collapse internal whitespace | css:.desc >> normalize-space() |
trim() |
Trim leading/trailing whitespace | css:.title >> trim() |
uppercase() |
Convert to uppercase | css:.vin >> uppercase() |
lowercase() |
Convert to lowercase | css:.name >> lowercase() |
substring(start, len) |
Extract substring (0-indexed) | css:.code >> substring(0, 3) |
substring-after('delim') |
Text after delimiter | css:.vin >> substring-after('VIN: ') |
substring-before('delim') |
Text before delimiter | css:.info >> substring-before(': ') |
replace('find', 'repl') |
String replacement | css:.price >> replace('$', 'USD ') |
get-attr('name') |
Extract element attribute (CSS only) | css:a >> get-attr('href') |
Chaining Example
// Extract VIN from "VIN: 1HGCM82633A123456", take first 3 chars, lowercase
let result = cs.select;
// => "1hg"
CSS Text Pseudo-Selectors
Custom pseudo-selectors for filtering elements by their text content. These go beyond standard CSS to let you match elements based on what they contain.
| Pseudo-Selector | Behavior |
|---|---|
:has-text('x') |
Element or its descendants contain the text |
:contains-text('x') |
Element's own text content contains the text |
:text-equals('x') |
Element's text content exactly equals the text |
:text-starts('x') |
Element's text content starts with the text |
:text-ends('x') |
Element's text content ends with the text |
Usage
cs.add_html;
// Find .value inside the .item that contains "Exterior:"
let color = cs.select;
// => "Blue Metallic"
// Find values ending with a specific word
let result = cs.select;
// => "Black Leather"
// Combine with post-processing
let result = cs.select;
// => "BLACK LEATHER"
Regex
Regex queries work on all content types. Capture groups are extracted automatically.
cs.add_text;
// With capture group — returns the captured value
let lat = cs.select;
// => "40.7128"
// Without capture group — returns full match
let prices = cs.query;
// => ["$100", "$200", "$300"]
XPath 1.0
Full XPath 1.0 support for HTML documents, powered by sxd_html + sxd_xpath. The | union operator works natively since function pipes use >>.
cs.add_html;
// Text extraction
cs.select;
// Attribute-based selection
cs.select;
// XPath functions
cs.select;
cs.select;
// XPath union operator (|) works without conflict
cs.query;
// With post-processing via >>
cs.select;
JMESPath
Query JSON documents using JMESPath expressions.
cs.add_json;
// Simple path
let name = cs.select;
// => "Widget"
// Array projection
let names = cs.query;
// => ["Widget", "Gadget"]
Multi-Content Queries
Load multiple documents and query across all of them simultaneously.
let mut cs = new;
cs.add_text;
cs.add_text;
cs.add_html;
// Regex searches across all loaded content
let prices = cs.query;
// => ["100", "200", "300"]
Project Structure
src/
├── lib.rs # Public API — ChadSelect struct and re-exports
├── content.rs # ContentItem, ContentType, lazy caching
├── query.rs # Query parsing, prefix routing, compatibility
├── functions.rs # Post-processing text functions (>> pipeline)
└── engine/
├── mod.rs
├── regex.rs # Regex extraction engine
├── xpath.rs # XPath 1.0 extraction engine
├── css.rs # CSS selector engine + text pseudo-selectors
└── json.rs # JMESPath extraction engine
tests/
├── regex_tests.rs # Regex engine tests
├── xpath_tests.rs # XPath engine tests
├── css_tests.rs # CSS engine tests
├── json_tests.rs # JMESPath engine tests
├── functions_tests.rs # Post-processing function tests
└── integration_tests.rs# Cross-engine and multi-content tests
Design Principles
- Never panic — Invalid queries, malformed content, out-of-bounds indices: everything returns empty results.
- Prefix routing — The query string itself declares the engine. No mode switching, no builder patterns.
>>function pipe — Unambiguous across all engines. XPath|union and JMESPath|pipe work natively.- Lazy & cached — Documents are parsed once on first access, then reused. XPath factories and contexts are cached.
- Batteries included — Post-processing, text pseudo-selectors, and index selection are built in. No external pipeline needed.
Crate Dependencies
| Crate | Purpose |
|---|---|
regex |
Regular expressions |
sxd-document + sxd-xpath |
XPath 1.0 evaluation |
sxd-html |
HTML → XPath document parsing |
scraper |
CSS selector engine |
serde_json |
JSON parsing |
jmespath |
JMESPath evaluation |
log |
Structured logging (no output without a subscriber) |
License
MIT OR Apache-2.0