Tree-sitter grammar for HTML following the WHATWG HTML Living Standard
This grammar provides spec-compliant HTML parsing including:
- Void elements (§13.1.2): area, base, br, col, embed, hr, img, input, link, meta, source, track, wbr
- Raw text elements (§13.1.2.1): script, style
- Escapable raw text elements (§13.1.2.2): textarea, title
- Optional end tags (§13.1.2.4): Proper implicit closing
- Character references (§13.5): Named, decimal, and hex entities
Example
use LANGUAGE;
let mut parser = new;
parser.set_language.expect;
let source = r#"<!DOCTYPE html>
<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>Welcome to <strong>HTML</strong>!</p>
<img src="logo.png" alt="Logo">
</body>
</html>"#;
let tree = parser.parse.unwrap;
assert!;