kumo-derive
Procedural macro crate for kumo — generates [Extract] implementations from #[extract(...)] field annotations.
This crate is an implementation detail of kumo. You should not depend on it directly — use the
derivefeature flag on the mainkumocrate instead.
Usage
Enable the derive feature on kumo:
[]
= { = "0.1", = ["derive"] }
Then annotate your struct:
use *;
use Serialize;
Call it in your spider:
async
Supported attributes
| Attribute | Example | Description |
|---|---|---|
css |
css = "h1.title" |
Required. CSS selector to match the element. |
attr |
attr = "href" |
Read an HTML attribute instead of text content. |
re |
re = r"\d+" |
Apply a regex and return the first match / capture group 1. |
text |
text |
Explicit text extraction (default; can be omitted). |
default |
default = "N/A" |
Fallback value for String fields when the selector returns empty. Ignored for Option<String>. |
transform |
transform = "trim" |
Apply a named transform after extraction. Values: trim, lowercase, uppercase. Compile error if unknown. |
llm_fallback |
llm_fallback = "the price" |
Fall back to an LLM when the selector returns empty. Requires an LLM provider feature (claude, openai, etc.) and passing a client to extract_from. |
llm_fallback (bare) |
llm_fallback |
Same as above, using the field name as the extraction hint. |
Field types
String— usesunwrap_or_default()on missing matches (empty string when not found)Option<String>— stays asNonewhen not found
License
MIT