Expand description
§llmweb
Extract structured data from any webpage by combining a headless browser with an LLM.
- 5 preprocessing modes:
Html(cleaned),RawHtml,Markdown,Text,Image. - Code generation (route A): the LLM emits a JS extractor that runs in the
browser via
tab.evaluate— store it, replay it without further LLM cost. - Selector recipe (route B): the LLM emits a declarative CSS-selector recipe, executed in pure Rust against any HTML.
Re-exports§
pub use error::LlmWebError;pub use error::Result as LlmWebResult;pub use preprocess::Format;pub use preprocess::Preprocessed;pub use preprocess::RunOptions;pub use recipe::ExtractRecipe;pub use recipe::FieldRule;pub use streaming::PartialStream;
Modules§
- error
- openai
- Re-exports from
async-openaiso users can build a custom client without taking a direct dependency on the crate version. - preprocess
- Page preprocessing.
- recipe
- Route B — declarative selector recipes.
- streaming
- Incremental JSON streaming.
Macros§
Structs§
- Browser
- Thin wrapper around a
headless_chrome::Browserconfigured with stealth flags. - LlmWeb
- The main client. Holds an
async-openaiclient and the model name. - Tab
- A handle to a single page. Exposes methods for simulating user actions (clicking, typing), and also for getting information about the DOM and other parts of the page.
Functions§
- run_
recipe_ on_ tab - Apply a recipe against the current state of a tab the caller has already
navigated. No LLM call. Uses raw HTML so attributes (
href,src, etc.) the recipe depends on are preserved. - run_
script_ on_ tab - Run a previously-generated JS extractor against a tab the caller has already navigated. No LLM call.