Module attributes

Expand description

Parsing for Pandoc-style attributes: {#id .class key=value}

Attributes can appear after headings, fenced code blocks, fenced divs, etc. Syntax: {#identifier .class1 .class2 key1=val1 key2=“val2”}

Rules:

Surrounded by { }
Identifier: #id (optional, only first one counts)
Classes: .class (can have multiple)
Key-value pairs: key=value or key=“value” or key=‘value’ (can have multiple)
Whitespace flexible between items

Structs§

AttributeBlock

Functions§

emit_attribute_node: Emit a Pandoc {...} ATTRIBUTE node by STRUCTURING the raw source slice into ATTR_* children that wrap the original bytes (no synthesis). Markers and quotes stay inside their tokens; whitespace/newlines between components, and any bytes the scanner skips (duplicate #id, malformed tokens), become standalone WHITESPACE/NEWLINE/TEXT tokens — so node.text() is exactly the source slice. Non-{...}-shaped or unrecognized input (MMD [#id] header brackets, raw-inline {=format}, empty {}) falls back to a single opaque ATTRIBUTE token, preserving the prior shape.
emit_code_info_attrs: Structure a code-block info-string region containing a {...} attribute block into ATTR_* children wrapping the source bytes, the same way emit_attribute_node does — but with a language carve-out: when carve_first_class_as_language is set, the first .class component is emitted as TEXT "." + CODE_LANGUAGE <lang> (Pandoc’s {.python …} language-first shape) instead of an ATTR_CLASS.
emit_div_info_node: Emit a fenced-div DIV_INFO node, structuring the Pandoc {...} body the same way emit_attribute_node does. Bare-word shorthand (::: Warning) and malformed/empty bodies fall back to a single opaque TEXT token, preserving the prior DIV_INFO { TEXT(...) } shape (and the bare-word class semantics the projector reads via parse_div_info).
emit_html_attrs_node: Emit a structural HTML_ATTRS node, wrapping the source bytes of each recognized HTML attribute in ATTR_ID / ATTR_CLASS / ATTR_KEY_VALUE children (bare values — HTML has no #/. marker). Bytes between/around components (names, =, quotes, whitespace, /) become gap tokens, so node.text() is exactly attrs_text. An unrecognized/empty body falls back to a single opaque TEXT token.
emit_html_span_attributes_node: As emit_html_attrs_node but for the legacy native-span SPAN_ATTRIBUTES node, which carries HTML class="..." syntax (not Pandoc {...}).
emit_span_attributes_node: Emit a bracketed-span SPAN_ATTRIBUTES node, structuring the Pandoc {...} body the same way emit_attribute_node does. Malformed/empty bodies fall back to a single opaque TEXT token, preserving the prior SPAN_ATTRIBUTES { TEXT(...) } shape.
parse_attribute_content: Parse the content inside the attribute braces into owned strings. Thin wrapper over [attribute_content_spans] so detection and emission share one walk.
parse_html_attribute_list: Parse a raw HTML attribute list (the bytes between a tag name and the closing >, exclusive). Accepts inputs like id="x" class="a b" data-key=v and produces an AttributeBlock. Returns None if no recognized attributes are present.
parse_html_tag_attributes: Parse HTML-style attributes from a raw HTML opening tag text such as <div id="x" class="a b" data-key="v">, returning the same AttributeBlock shape as Pandoc-style brace attributes. Whitespace- separated class="..." is split into individual classes; id="..." becomes the identifier; everything else becomes a key/value pair. Returns None if the tag has no recognized attributes.
try_parse_trailing_attributes: Try to parse an attribute block from the end of a string Returns: (attribute_block, text_before_attributes)
try_parse_trailing_attributes_with_pos: Try to parse an attribute block from the end of a string. Returns: (attribute_block, text_before_attributes, open_brace_position_in_trimmed_text)

Module attributes

Module attributes Copy item path

Structs§

Functions§

Module attributes