# CLAUDE.md -- Rust Project Conventions
## Philosophy
Functional, type-driven, domain-driven.
## Architecture
- Modules by domain context.
- One module per parsing concern (span, attr, node, entity, tokenizer, tree_builder).
- Thin lib.rs that wires them together and exposes `parse(source) -> Result<Document, Error>`.
## Types
- Newtypes for domain primitives.
- Sum types for variants.
- No public struct fields; enum variants with named fields are permitted.
- `#[must_use]` on getters and constructors.
## Error Handling
- Single project-wide `Error` enum.
- Display + std::error::Error impls by hand; no thiserror, no anyhow.
- Never panic. HTML parsing is intentionally permissive; malformed input
produces recovery actions, not fatal errors.
## Style
- Prefer match over if/else, except on bool.
- No `return` keyword.
- No `mut`.
- Combinators over loops.
- Never match on Option<_>; use combinators.
- Prefer combinators on Result<_, _>; match only when arms are complex.
- No unwrap()/expect() anywhere.
- No loop or for.
- No scan.
- No Rc/Arc.
- No naked `as` casts (f64 lossy casts must be `#[allow]`-gated and commented).
- Exhaustive matches; no `_` wildcard arm on enums.
- Iterators over indexed access.
## Traits
- No dyn Trait.
- Implement standard traits over ad-hoc methods.
## Linting
```toml
[lints.clippy]
all = { level = "deny", priority = -1 }
pedantic = { level = "warn", priority = -1 }
needless_pass_by_value = "warn"
manual_map = "warn"
```
## Verification
- Always run `RUSTFLAGS="-D warnings" cargo clippy --all-targets`.
- Always run `cargo fmt`.
## Testing
- Tests return `Result<(), Error>` and propagate failures with `?`.
- No assert!, assert_eq!, panic!, unreachable!, unwrap, expect.
- Integration tests in tests/ organized by parsing concern.
## Dependencies
- No runtime deps in v0. This crate is the foundation of the rendering
stack; everything downstream (css-cat, dom-cat, layout-cat, ...)
depends on it.
- `proptest` dev-dep.
- No path dependencies.
## Documentation
- `///` doc comments on every public item.
- `# Examples` with runnable code blocks.
- Doctests must not use unwrap/expect/unreachable.
## Layer context
This crate is the **HTML parser** in the Servo-replacement webview stack:
1. **`html-cat`** -- HTML5 tokenizer + tree builder.
2. `css-cat` -- CSS parser.
3. `dom-cat` -- DOM model + mutation API.
4. `layout-cat` -- box model + layout.
5. `paint-cat` -- display-list construction.
6. `net-cat` -- fetch/HTTP.
7. `web-api-cat` -- bindings between boa-cat Values and DOM.
8. `tauri-runtime-servocat` -- meta-crate wiring everything into Tauri.
## v0 scope
- Standard HTML5 tokenizer state machine: Data, Tag, AttrName, AttrValue,
Comment, Doctype, plus raw-text contexts (`script`, `style`,
`textarea`).
- Tree-builder insertion modes: Initial, BeforeHtml, BeforeHead, InHead,
InBody, AfterBody.
- Void elements (`br`, `img`, `input`, ...).
- Named char references for the common set (`amp`, `lt`, `gt`, `quot`,
`apos`, `nbsp`) plus numeric refs (`{`, `{`).
- Doctype recognition (HTML5 only; quirks mode flagged but not modeled).
- Comments and CDATA-like content in script/style raw-text.
## Deferred to v0.2+
- Full named-entity table (only the common subset in v0).
- Foreign content (SVG, MathML).
- Template element semantics.
- Adoption agency algorithm for misnested formatting elements.
- Document.write integration.
- Quirks mode behavior beyond a flag.
- Streaming parser (`parse` consumes the full source up front).