decruft 0.1.2

Extract clean, readable content from web pages
Documentation
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Article with Appendix</title>
</head>
<body>
<main>
<article>
<div class="article-body">
<h1>Article with Appendix Section</h1>
<p>This is the main article content discussing an important topic. It contains multiple paragraphs of text that establish it as the primary content of the page.</p>
<h2>Introduction</h2>
<p>The introduction provides background information about the subject matter. We explore the key concepts and methodologies used in our analysis of the data.</p>
<h2>Results</h2>
<p>Our analysis revealed several interesting findings that warrant further discussion. The data shows clear trends across all measured variables.</p>
<h2 id="appendix-i">Appendix I</h2>
<p>Below is a detailed breakdown of results across all test conditions and parameters used in the study.</p>
<p>The raw data tables show measurements taken at regular intervals throughout the experiment.</p>
<h2>Acknowledgements</h2>
<p>Thanks to all contributors who made this research possible.</p>
</div>
</article>
</main>
</body>
</html>