ChadSelect
One query. Any format. Every selector.
Unified data extraction — Regex, XPath 1.0, CSS Selectors, and JMESPath behind one query interface. Load your content, prefix your query, get results. Never panics.
use ChadSelect;
let mut cs = new;
cs.add_html;
let price = cs.select;
assert_eq!;
Install
[]
= "0.2.1"
Query Syntax
Every query uses an engine:expression prefix. No prefix defaults to regex.
| Prefix | Engine | Content Types | Backed By |
|---|---|---|---|
css: |
CSS Selectors | HTML | scraper |
xpath: |
XPath 1.0 | HTML, Text | sxd-xpath |
regex: |
Regular Expressions | All | regex |
json: |
JMESPath | JSON | jmespath |
The index Parameter
Every query method takes an index argument that controls which match to return:
| Value | Behavior |
|---|---|
-1 |
Return all matches across every loaded document |
0 |
Return only the first match |
N |
Return only the Nth match (0-based) |
let mut cs = new;
cs.add_html;
let all = cs.query; // vec!["A", "B", "C"]
let first = cs.query; // vec!["A"]
let third = cs.query; // vec!["C"]
let oob = cs.query; // vec![] (out of bounds — never panics)
// select() wraps query() — returns a single String
let s = cs.select; // "A"
let s = cs.select; // "A" (first of all matches)
When multiple documents are loaded, -1 aggregates results from all compatible documents before indexing.
Content Management
Load one or more documents. Each document is tagged by type and only queried by compatible engines.
use ChadSelect;
let mut cs = new;
// HTML — compatible with css:, xpath:, regex:
cs.add_html;
// JSON — compatible with json:, regex:
cs.add_json;
// Plain text — compatible with regex:, xpath:
cs.add_text;
assert_eq!;
cs.clear; // remove all content
CSS Selectors
Standard CSS selectors, plus custom text pseudo-selectors for scraping.
let mut cs = new;
cs.add_html;
// Basic selectors
let first_name = cs.select;
assert_eq!;
// All matches — index -1
let all_prices = cs.query;
assert_eq!;
// Nth match — index 2 (0-based)
let third = cs.query;
assert_eq!;
// Attribute extraction via get-attr()
let id = cs.select;
assert_eq!;
Text Pseudo-Selectors
These work like Playwright's pseudo-selectors — match elements by text content.
| Pseudo-Selector | Behavior |
|---|---|
:has-text('x') |
Element or its descendants contain the text |
:contains-text('x') |
Element's own text contains the text |
:text-equals('x') |
Element's text exactly equals |
:text-starts('x') |
Element's text starts with |
:text-ends('x') |
Element's text ends with |
let mut cs = new;
cs.add_html;
// :has-text — matches the .row whose subtree contains "Exterior"
let color = cs.select;
assert_eq!;
// :text-equals — exact match on element text
let engine_label = cs.select;
assert_eq!;
// :text-starts — prefix match
let starts_e = cs.select;
assert_eq!;
// :text-ends — suffix match
let ends_or = cs.select;
assert_eq!;
// Combine with function piping
let upper_interior = cs.select;
assert_eq!;
XPath 1.0
Full XPath 1.0 support including axes, predicates, and XPath functions.
let mut cs = new;
cs.add_html;
// text() extraction
let title = cs.select;
assert_eq!;
// With normalize-space
let clean_title = cs.select;
assert_eq!;
// Predicate-based selection — find the <td> after "VIN"
let vin = cs.select;
assert_eq!;
// All values from the second column
let all_values = cs.query;
assert_eq!;
// XPath string() on attribute
let title_id = cs.select;
assert_eq!;
Regex
Capture groups or full matches. Works on HTML, JSON, and plain text content.
let mut cs = new;
cs.add_text;
// Capture group — returns the group, not the full match
let vin = cs.select;
assert_eq!;
// Full match — no capture group
let stock = cs.select;
assert_eq!;
// Multiple capture groups — returns first group
let price_digits = cs.select;
assert_eq!;
// All matches
let all_numbers = cs.query;
// Returns all digit sequences found in the text
// No prefix — defaults to regex
let vin2 = cs.select;
assert_eq!;
Regex on HTML
Regex runs on the raw HTML string, not parsed text — useful for extracting from attributes, comments, or script tags.
let mut cs = new;
cs.add_html;
let price = cs.select;
assert_eq!;
JMESPath (JSON)
Full JMESPath expression support for structured JSON extraction.
let mut cs = new;
cs.add_json;
// Simple field access
let dealer = cs.select;
assert_eq!;
// Array indexing
let first = cs.select;
assert_eq!;
// Projection — all names
let names = cs.query;
assert_eq!;
// Filter expression
let expensive = cs.query;
assert_eq!;
// Nested access
let rating = cs.select;
assert_eq!;
// Flatten nested arrays
let all_tags = cs.query;
assert_eq!;
Post-Processing Functions
Pipe results through text transformations using >>. This operator was chosen over | because | is reserved by XPath (union) and JMESPath (pipe).
css:.selector >> function1() >> function2()
xpath://path/text() >> trim() >> uppercase()
regex:pattern >> replace('$', 'USD ')
| Function | Description | Example |
|---|---|---|
normalize-space() |
Trim + collapse internal whitespace | css:.desc >> normalize-space() |
trim() |
Trim leading/trailing whitespace | css:.title >> trim() |
uppercase() |
Convert to UPPER CASE | css:.vin >> uppercase() |
lowercase() |
Convert to lower case | css:.name >> lowercase() |
substring(start, len) |
Extract substring (0-based) | css:.code >> substring(0, 3) |
substring-after('delim') |
Text after first delimiter | css:.info >> substring-after('VIN: ') |
substring-before('delim') |
Text before first delimiter | css:.info >> substring-before(': ') |
replace('find', 'repl') |
Replace all occurrences | css:.price >> replace('$', 'USD ') |
get-attr('name') |
Element attribute (CSS only) | css:a.link >> get-attr('href') |
Chaining Functions
Functions execute left-to-right. Empty results are filtered after each step.
let mut cs = new;
cs.add_html;
// Chain: extract text → get everything after "VIN: " → first 3 chars → lowercase
let result = cs.select;
assert_eq!;
let mut cs = new;
cs.add_html;
// Attribute extraction
let href = cs.select;
assert_eq!;
let mut cs = new;
cs.add_html;
// Clean + transform
let clean_price = cs.select;
assert_eq!;
API Reference
Core Query Methods
use ChadSelect;
let mut cs = new;
cs.add_html;
// query() — returns Vec<String>, never panics
let all_matches = cs.query; // all results
let first_only = cs.query; // vec with 1st result or empty
let third = cs.query; // vec with 3rd result or empty
// select() — returns String, empty on no match
let price = cs.select; // first valid result or ""
Fallback Chains — select_first
Try queries in priority order. Returns the first result set where all values pass validation.
let mut cs = new;
cs.add_html;
// #exact-id doesn't exist, falls through to .alt-price
let result = cs.select_first;
assert_eq!;
Multi-Source — select_many
Combine unique results from multiple queries.
let mut cs = new;
cs.add_html;
let prices = cs.select_many;
// Contains both "$30,000" and "$28,500" (unique, unordered)
assert!;
assert!;
Custom Validators — select_where
Filter results with a closure. The _where variants exist for select, select_first, and select_many.
let mut cs = new;
cs.add_html;
// Reject "0" as a valid price
let price = cs.select_where;
assert_eq!; // first match "0" rejected, no fallback within select_where
// With select_first_where — falls through to next query
let mut cs2 = new;
cs2.add_text;
let r = cs2.select_first_where;
assert_eq!;
Batch Queries — query_batch
Execute many queries in one call. Returns Vec<Vec<String>> in input order.
let mut cs = new;
cs.add_html;
cs.add_json;
let results = cs.query_batch;
assert_eq!;
assert_eq!;
assert_eq!;
Multi-Content Queries
When multiple documents are loaded, queries search across all compatible content. Use query(-1, ...) to get results from every document.
let mut cs = new;
cs.add_html;
cs.add_html;
// Searches both HTML documents
let titles = cs.query;
assert_eq!;
// Mixing content types
cs.add_json;
// css: only queries HTML content — JSON is skipped
let html_titles = cs.query;
assert_eq!;
// json: only queries JSON content
let json_title = cs.select;
assert_eq!;
// regex: searches everything
let all = cs.query;
assert_eq!;
Error Handling
ChadSelect never panics. Every invalid query, malformed content, or out-of-bounds index returns empty results.
let mut cs = new;
cs.add_html;
// Invalid CSS selector — returns ""
let r = cs.select;
assert_eq!;
// Out of bounds index — returns empty vec
let r = cs.query;
assert_eq!;
// Wrong engine for content type — returns ""
cs.add_json;
let r = cs.select; // css: doesn't apply to JSON
// Only the HTML is searched, no ".something" found → ""
Design Principles
- Never panic — invalid queries, malformed content, and out-of-bounds indices all return empty results
- Prefix routing — the query string declares the engine; no mode switching or builder patterns
>>function pipe — unambiguous across all engines; XPath|and JMESPath|work natively- Batteries included — post-processing, text pseudo-selectors, validators, and index selection are all built in
Also Available
ChadSelect is also available as a Python package with identical API and query syntax.
License
MIT