Skip to main content

Crate llmweb

Crate llmweb 

Source
Expand description

§llmweb

Extract structured data from any webpage by combining a headless browser with an LLM.

  • 5 preprocessing modes: Html (cleaned), RawHtml, Markdown, Text, Image.
  • Code generation (route A): the LLM emits a JS extractor that runs in the browser via tab.evaluate — store it, replay it without further LLM cost.
  • Selector recipe (route B): the LLM emits a declarative CSS-selector recipe, executed in pure Rust against any HTML.

Re-exports§

pub use error::LlmWebError;
pub use error::Result as LlmWebResult;
pub use preprocess::Format;
pub use preprocess::Preprocessed;
pub use preprocess::RunOptions;
pub use recipe::ExtractRecipe;
pub use recipe::FieldRule;
pub use streaming::PartialStream;

Modules§

error
openai
Re-exports from async-openai so users can build a custom client without taking a direct dependency on the crate version.
preprocess
Page preprocessing.
recipe
Route B — declarative selector recipes.
streaming
Incremental JSON streaming.

Macros§

strip_markdown_backticks

Structs§

Browser
Thin wrapper around a headless_chrome::Browser configured with stealth flags.
LlmWeb
The main client. Holds an async-openai client and the model name.
Tab
A handle to a single page. Exposes methods for simulating user actions (clicking, typing), and also for getting information about the DOM and other parts of the page.

Functions§

run_recipe_on_tab
Apply a recipe against the current state of a tab the caller has already navigated. No LLM call. Uses raw HTML so attributes (href, src, etc.) the recipe depends on are preserved.
run_script_on_tab
Run a previously-generated JS extractor against a tab the caller has already navigated. No LLM call.