Expand description
LaTeX formula extraction module (R1).
Extracts LaTeX formulas from HTML content, handling multiple sources:
- Habr:
img.formulaelements withsourceattribute KaTeX:.katexelements withannotation[encoding="application/x-tex"]MathJax:mjx-containerelements withdata-tex/data-latexattributes
Based on reference implementation from: https://github.com/link-foundation/meta-theory/blob/main/scripts/download-article.mjs
Functionsยง
- extract_
formula - Extract formula from any supported element type.
- extract_
habr_ formula - Extract LaTeX source from a formula image element (Habr-specific).
- extract_
katex_ formula - Extract LaTeX from
KaTeXelements. - extract_
mathjax_ formula - Extract LaTeX from
MathJaxelements. - is_
formula_ image - Check if an element is a formula image (Habr-specific).
- is_
math_ element - Check if an element is a math element (
KaTeX,MathJax, or generic math class).