Module html

Module html 

Source
Expand description

Utilities for working with HTML.

Adapted from langchain_core/utils/html.py

Constants§

PREFIXES_TO_IGNORE
Prefixes to ignore when extracting links.
SUFFIXES_TO_IGNORE
Suffixes to ignore when extracting links.

Functions§

default_link_regex
Default regex pattern for extracting links from HTML. This captures all href values, filtering is done in Rust code.
extract_sub_links
Extract all links from a raw HTML string and convert into absolute paths.
find_all_links
Extract all links from a raw HTML string.