Module regex_cache

Expand description

Cached Regex Patterns and Fast Content Checks for Markdown Linting

This module provides a centralized collection of pre-compiled, cached regex patterns for all major Markdown constructs (headings, lists, code blocks, links, images, etc.). It also includes fast-path utility functions for quickly checking if content potentially contains certain Markdown elements, allowing rules to skip expensive processing when unnecessary.

§Performance

All regexes are compiled once at startup using lazy_static, avoiding repeated compilation and improving performance across the linter. Use these shared patterns in rules instead of compiling new regexes.

§Usage

Use the provided statics for common Markdown patterns.
Use the regex_lazy! macro for ad-hoc regexes that are not predefined.
Use the utility functions for fast content checks before running regexes.

Macros§

regex_lazy: Macro for defining a lazily-initialized, cached regex pattern.

Structs§

RegexCache: Global regex cache for dynamic patterns

Constants§

URL_IPV6_STR: Pattern for IPv6 URLs specifically.
URL_QUICK_CHECK_STR: Quick check pattern for early exits.
URL_SIMPLE_STR: Simple URL pattern for content detection.
URL_STANDARD_STR: Pattern for standard HTTP(S)/FTP(S) URLs with full path support.
URL_WWW_STR: Pattern for www URLs without protocol.
XMPP_URI_STR: Pattern for XMPP URIs per GFM extended autolinks specification.

Statics§

ABBREVIATION
ALTERNATE_FENCED_CODE_BLOCK_END
ALTERNATE_FENCED_CODE_BLOCK_START
ASTERISK_EMPHASIS
ATX_HEADING_REGEX
ATX_HEADING_WITH_CAPTURE
BLOCKQUOTE_PREFIX_RE
BOLD_ASTERISK_REGEX
BOLD_UNDERSCORE_REGEX
CLOSED_ATX_HEADING_REGEX
CODE_FENCE_REGEX
DECIMAL_NUMBER
DISPLAY_MATH_REGEX
DOUBLE_ASTERISK_EMPHASIS
DOUBLE_ASTERISK_SPACE_END
DOUBLE_ASTERISK_SPACE_START
DOUBLE_UNDERSCORE_EMPHASIS
EMAIL_PATTERN
EMOJI_SHORTCODE_REGEX
EMPHASIS_REGEX
EXTERNAL_URL_REGEX
FENCED_CODE_BLOCK_END
FENCED_CODE_BLOCK_END_REGEX
FENCED_CODE_BLOCK_START
FENCED_CODE_BLOCK_START_REGEX
FOOTNOTE_REF_REGEX
FRONT_MATTER_REGEX
HEADING_CHECK
HR_ASTERISK
HR_DASH
HR_SPACED_ASTERISK
HR_SPACED_DASH
HR_SPACED_UNDERSCORE
HR_UNDERSCORE
HTML_COMMENT_END
HTML_COMMENT_PATTERN
HTML_COMMENT_START
HTML_ENTITY_REGEX
HTML_HEADING_PATTERN
HTML_OPENING_TAG_FINDER
HTML_SELF_CLOSING_TAG_REGEX
HTML_TAG_FINDER
HTML_TAG_PATTERN
HTML_TAG_QUICK_CHECK
HTML_TAG_REGEX
HUGO_SHORTCODE_REGEX
IMAGE_REF_PATTERN
IMAGE_REGEX
INDENTED_CODE_BLOCK_PATTERN
INDENTED_CODE_BLOCK_REGEX
INLINE_CODE_REGEX
INLINE_IMAGE_FANCY_REGEX
INLINE_LINK_FANCY_REGEX
INLINE_LINK_REGEX
INLINE_MATH_REGEX
ITALIC_ASTERISK_REGEX
ITALIC_UNDERSCORE_REGEX
LINKED_IMAGE_INLINE_INLINE
LINKED_IMAGE_INLINE_REF
LINKED_IMAGE_REF_INLINE
LINKED_IMAGE_REF_REF
LINK_REFERENCE_DEFINITION_REGEX
LINK_REF_PATTERN
LINK_REGEX
LINK_TEXT_FULL_REGEX
LINK_TEXT_REGEX
LIST_ITEM
LIST_MARKER_ANY_REGEX
MULTIPLE_BLANK_LINES_REGEX
MULTIPLE_HYPHENS
ORDERED_LIST_MARKER_REGEX
REFERENCE_LINK
REF_IMAGE_REGEX
REF_LINK_REGEX
SENTENCE_END
SETEXT_HEADING_REGEX
SETEXT_HEADING_WITH_CAPTURE
SHORTCUT_REF_REGEX
SPACE_IN_EMPHASIS_REGEX
STRIKETHROUGH_FANCY_REGEX
STRIKETHROUGH_REGEX
TOC_SECTION_START
TRAILING_PUNCTUATION_REGEX
TRAILING_WHITESPACE_REGEX
UNDERSCORE_EMPHASIS
UNORDERED_LIST_MARKER_REGEX
URL_IN_TEXT: Greedy URL pattern for finding URLs in text for length calculation.
URL_IPV6_REGEX: IPv6 URL regex - for URLs with IPv6 addresses. See URL_IPV6_STR for documentation.
URL_PATTERN: Alias for URL_SIMPLE_REGEX. Used by MD013 for line length exemption.
URL_QUICK_CHECK_REGEX: Quick check regex - fast early-exit test. See URL_QUICK_CHECK_STR for documentation.
URL_SIMPLE_REGEX: Simple URL regex - for content detection and line length exemption. See URL_SIMPLE_STR for documentation.
URL_STANDARD_REGEX: Standard URL regex - primary pattern for bare URL detection (MD034). See URL_STANDARD_STR for documentation.
URL_WWW_REGEX: WWW URL regex - for URLs starting with www. without protocol. See URL_WWW_STR for documentation.
WIKI_LINK_REGEX
XMPP_URI_REGEX: XMPP URI regex - for GFM extended autolinks. See XMPP_URI_STR for documentation.

Functions§

contains_url: Optimize URL detection by implementing a character-by-character scanner that’s much faster than regex for cases where we know there’s no URL
escape_regex: Escapes a string to be used in a regex pattern
get_cache_stats: Get cache usage statistics
get_cached_fancy_regex: Get a fancy regex from the global cache
get_cached_regex: Get a regex from the global cache
has_code_block_markers: Check if content contains any code blocks (quick check before regex)
has_emphasis_markers: Check if content contains any emphasis markers (quick check before regex)
has_heading_markers: Utility functions for quick content checks Check if content contains any headings (quick check before regex)
has_html_tags: Check if content contains any HTML tags (quick check before regex)
has_image_markers: Check if content contains any images (quick check before regex)
has_link_markers: Check if content contains any links (quick check before regex)
has_list_markers: Check if content contains any lists (quick check before regex)
is_blank_in_blockquote_context: Check if a line is blank in the context of blockquotes.