Skip to main content

Module latex

Module latex 

Source
Expand description

LaTeX formula extraction module (R1).

Extracts LaTeX formulas from HTML content, handling multiple sources:

  • Habr: img.formula elements with source attribute
  • KaTeX: .katex elements with annotation[encoding="application/x-tex"]
  • MathJax: mjx-container elements with data-tex/data-latex attributes

Based on reference implementation from: https://github.com/link-foundation/meta-theory/blob/main/scripts/download-article.mjs

Functionsยง

extract_formula
Extract formula from any supported element type.
extract_habr_formula
Extract LaTeX source from a formula image element (Habr-specific).
extract_katex_formula
Extract LaTeX from KaTeX elements.
extract_mathjax_formula
Extract LaTeX from MathJax elements.
is_formula_image
Check if an element is a formula image (Habr-specific).
is_math_element
Check if an element is a math element (KaTeX, MathJax, or generic math class).