Pickaxe
Pickaxe is a Python package for structured data extraction from HTML documents. It provides a simple and intuitive API for parsing HTML documents, and automatically extracting structured data from them.
Features
- Written in Rust: Pickaxe is written in Rust, which makes it fast and memory-efficient.
- Robust: Pickaxe uses the
html5ever
andselectors
crate for browser-grade HTML parsing and CSS selector matching. - CSS Selectors & XPath: Pickaxe supports both CSS selectors and (simple) XPath expressions for querying HTML documents.
Quick Start
Python
Installation
Basic Usage
# Parse an HTML document
=
# Access elements using CSS selectors or XPath expressions
=
# Output: Hello, World!
=
# Output: Hello, World!
Rust
Installation
Basic Usage
use HtmlDocument;
License
This project is licensed under MIT License.
Support & Feedback
If you encounter any issues or have feedback, please open an issue. We'd love to hear from you!
Made with ❤️ by Emergent Methods