Expand description
Cached Regex Patterns and Fast Content Checks for Markdown Linting
This module provides a centralized collection of pre-compiled, cached regex patterns for all major Markdown constructs (headings, lists, code blocks, links, images, etc.). It also includes fast-path utility functions for quickly checking if content potentially contains certain Markdown elements, allowing rules to skip expensive processing when unnecessary.
§Performance
All regexes are compiled once at startup using lazy_static, avoiding repeated
compilation and improving performance across the linter. Use these shared patterns
in rules instead of compiling new regexes.
§Usage
- Use the provided statics for common Markdown patterns.
- Use the
regex_lazy!macro for ad-hoc regexes that are not predefined. - Use the utility functions for fast content checks before running regexes.
Macros§
- regex_
lazy - Macro for defining a lazily-initialized, cached regex pattern.
Structs§
- Regex
Cache - Global regex cache for dynamic patterns
Constants§
- URL_
IPV6_ STR - Pattern for IPv6 URLs specifically.
- URL_
QUICK_ CHECK_ STR - Quick check pattern for early exits.
- URL_
SIMPLE_ STR - Simple URL pattern for content detection.
- URL_
STANDARD_ STR - Pattern for standard HTTP(S)/FTP(S) URLs with full path support.
- URL_
WWW_ STR - Pattern for www URLs without protocol.
- XMPP_
URI_ STR - Pattern for XMPP URIs per GFM extended autolinks specification.
Statics§
- ABBREVIATION
- ALTERNATE_
FENCED_ CODE_ BLOCK_ END - ALTERNATE_
FENCED_ CODE_ BLOCK_ START - ASTERISK_
EMPHASIS - ATX_
HEADING_ REGEX - ATX_
HEADING_ WITH_ CAPTURE - BLOCKQUOTE_
PREFIX_ RE - BOLD_
ASTERISK_ REGEX - BOLD_
UNDERSCORE_ REGEX - CLOSED_
ATX_ HEADING_ REGEX - CODE_
FENCE_ REGEX - DECIMAL_
NUMBER - DISPLAY_
MATH_ REGEX - DOUBLE_
ASTERISK_ EMPHASIS - DOUBLE_
ASTERISK_ SPACE_ END - DOUBLE_
ASTERISK_ SPACE_ START - DOUBLE_
UNDERSCORE_ EMPHASIS - EMAIL_
PATTERN - EMOJI_
SHORTCODE_ REGEX - EMPHASIS_
REGEX - EXTERNAL_
URL_ REGEX - FENCED_
CODE_ BLOCK_ END - FENCED_
CODE_ BLOCK_ END_ REGEX - FENCED_
CODE_ BLOCK_ START - FENCED_
CODE_ BLOCK_ START_ REGEX - FOOTNOTE_
REF_ REGEX - FRONT_
MATTER_ REGEX - HEADING_
CHECK - HR_
ASTERISK - HR_DASH
- HR_
SPACED_ ASTERISK - HR_
SPACED_ DASH - HR_
SPACED_ UNDERSCORE - HR_
UNDERSCORE - HTML_
COMMENT_ END - HTML_
COMMENT_ PATTERN - HTML_
COMMENT_ START - HTML_
ENTITY_ REGEX - HTML_
HEADING_ PATTERN - HTML_
OPENING_ TAG_ FINDER - HTML_
SELF_ CLOSING_ TAG_ REGEX - HTML_
TAG_ FINDER - HTML_
TAG_ PATTERN - HTML_
TAG_ QUICK_ CHECK - HTML_
TAG_ REGEX - HUGO_
SHORTCODE_ REGEX - IMAGE_
REF_ PATTERN - IMAGE_
REGEX - INDENTED_
CODE_ BLOCK_ PATTERN - INDENTED_
CODE_ BLOCK_ REGEX - INLINE_
CODE_ REGEX - INLINE_
IMAGE_ FANCY_ REGEX - INLINE_
LINK_ FANCY_ REGEX - INLINE_
LINK_ REGEX - INLINE_
MATH_ REGEX - ITALIC_
ASTERISK_ REGEX - ITALIC_
UNDERSCORE_ REGEX - LINKED_
IMAGE_ INLINE_ INLINE - LINKED_
IMAGE_ INLINE_ REF - LINKED_
IMAGE_ REF_ INLINE - LINKED_
IMAGE_ REF_ REF - LINK_
REFERENCE_ DEFINITION_ REGEX - LINK_
REF_ PATTERN - LINK_
REGEX - LINK_
TEXT_ FULL_ REGEX - LINK_
TEXT_ REGEX - LIST_
ITEM - LIST_
MARKER_ ANY_ REGEX - MULTIPLE_
BLANK_ LINES_ REGEX - MULTIPLE_
HYPHENS - ORDERED_
LIST_ MARKER_ REGEX - REFERENCE_
LINK - REF_
IMAGE_ REGEX - REF_
LINK_ REGEX - SENTENCE_
END - SETEXT_
HEADING_ REGEX - SETEXT_
HEADING_ WITH_ CAPTURE - SHORTCUT_
REF_ REGEX - SPACE_
IN_ EMPHASIS_ REGEX - STRIKETHROUGH_
FANCY_ REGEX - STRIKETHROUGH_
REGEX - TOC_
SECTION_ START - TRAILING_
PUNCTUATION_ REGEX - TRAILING_
WHITESPACE_ REGEX - UNDERSCORE_
EMPHASIS - UNORDERED_
LIST_ MARKER_ REGEX - URL_
IN_ TEXT - Greedy URL pattern for finding URLs in text for length calculation.
- URL_
IPV6_ REGEX - IPv6 URL regex - for URLs with IPv6 addresses.
See
URL_IPV6_STRfor documentation. - URL_
PATTERN - Alias for
URL_SIMPLE_REGEX. Used by MD013 for line length exemption. - URL_
QUICK_ CHECK_ REGEX - Quick check regex - fast early-exit test.
See
URL_QUICK_CHECK_STRfor documentation. - URL_
SIMPLE_ REGEX - Simple URL regex - for content detection and line length exemption.
See
URL_SIMPLE_STRfor documentation. - URL_
STANDARD_ REGEX - Standard URL regex - primary pattern for bare URL detection (MD034).
See
URL_STANDARD_STRfor documentation. - URL_
WWW_ REGEX - WWW URL regex - for URLs starting with www. without protocol.
See
URL_WWW_STRfor documentation. - WIKI_
LINK_ REGEX - XMPP_
URI_ REGEX - XMPP URI regex - for GFM extended autolinks.
See
XMPP_URI_STRfor documentation.
Functions§
- contains_
url - Optimize URL detection by implementing a character-by-character scanner that’s much faster than regex for cases where we know there’s no URL
- escape_
regex - Escapes a string to be used in a regex pattern
- get_
cache_ stats - Get cache usage statistics
- get_
cached_ fancy_ regex - Get a fancy regex from the global cache
- get_
cached_ regex - Get a regex from the global cache
- has_
code_ block_ markers - Check if content contains any code blocks (quick check before regex)
- has_
emphasis_ markers - Check if content contains any emphasis markers (quick check before regex)
- has_
heading_ markers - Utility functions for quick content checks Check if content contains any headings (quick check before regex)
- has_
html_ tags - Check if content contains any HTML tags (quick check before regex)
- has_
image_ markers - Check if content contains any images (quick check before regex)
- has_
link_ markers - Check if content contains any links (quick check before regex)
- has_
list_ markers - Check if content contains any lists (quick check before regex)
- is_
blank_ in_ blockquote_ context - Check if a line is blank in the context of blockquotes.