pub fn clean_html(html: &str) -> Result<String>
Clean HTML content and convert to Markdown-like structured text. This preserves structural elements like headings, lists, and tables which are valuable for LLM understanding.