Skip to main content

Module sentence_utils

Module sentence_utils 

Source
Expand description

Sentence detection utilities

This module provides shared functionality for detecting sentence boundaries in markdown text. Used by both text reflow (MD013) and the multiple spaces rule (MD064).

Features:

  • Common abbreviation detection (Mr., Dr., Prof., etc.)
  • CJK punctuation support (。, !, ?)
  • Closing quote detection (straight and curly)
  • Both forward-looking (reflow) and backward-looking (MD064) sentence detection

Constants§

DEFAULT_ABBREVIATIONS
Default abbreviations that should NOT be treated as sentence endings.

Functions§

get_abbreviations
Get the effective abbreviations set based on custom additions All abbreviations are normalized to lowercase for case-insensitive matching Custom abbreviations are always merged with built-in defaults
is_after_sentence_ending
Check if multiple spaces occur immediately after sentence-ending punctuation. This is a backward-looking check used by MD064.
is_after_sentence_ending_with_abbreviations
Check if multiple spaces occur immediately after sentence-ending punctuation, with a custom abbreviations set.
is_cjk_char
Check if a character is a CJK character (Chinese, Japanese, Korean)
is_cjk_sentence_ending
Check if a character is CJK sentence-ending punctuation These include: 。(ideographic full stop), !(fullwidth exclamation), ?(fullwidth question)
is_closing_quote
Check if a character is a closing quote mark Includes straight quotes and curly/smart quotes
is_opening_quote
Check if a character is an opening quote mark Includes straight quotes and curly/smart quotes
is_sentence_ending_punctuation
Check if a character is sentence-ending punctuation (ASCII or CJK)
is_trailing_close_punctuation
Check if a character is closing punctuation that can follow sentence-ending punctuation This includes closing quotes, parentheses, and brackets
text_ends_with_abbreviation
Check if text ends with a common abbreviation followed by a period