Expand description
Utilities for working with HTML.
Adapted from langchain_core/utils/html.py
Constants§
- PREFIXES_
TO_ IGNORE - Prefixes to ignore when extracting links.
- SUFFIXES_
TO_ IGNORE - Suffixes to ignore when extracting links.
Functions§
- default_
link_ regex - Default regex pattern for extracting links from HTML. This captures all href values, filtering is done in Rust code.
- extract_
sub_ links - Extract all links from a raw HTML string and convert into absolute paths.
- find_
all_ links - Extract all links from a raw HTML string.