meta_oxide 0.1.1

Universal metadata extraction library supporting 13 formats (HTML Meta, Open Graph, Twitter Cards, JSON-LD, Microdata, Microformats, RDFa, Dublin Core, Web App Manifest, oEmbed, rel-links, Images, SEO) with 7 language bindings
Documentation
# Getting Started with MetaOxide

MetaOxide is a high-performance Rust library with Python bindings for extracting **ALL structured data** from HTML. It supports:

- **Standard Meta Tags** (100% of sites) - title, description, canonical, etc.
-**Open Graph** (60%+ of sites) - Facebook, LinkedIn sharing
-**Twitter Cards** (45% of sites) - Twitter/X previews
-**JSON-LD** (41% of sites, growing) - Google Rich Results, AI training
-**Microformats** (5-10% of sites) - h-card, h-entry, h-event

**433 tests | 98%+ coverage | Production ready**

## Table of Contents

- [Installation]#installation
- [Quick Start]#quick-start
- [Prerequisites]#prerequisites
- [Building from Source]#building-from-source
- [Basic Usage]#basic-usage
- [Next Steps]#next-steps

## Installation

### For Python Users

Once published, you'll be able to install via pip:

```bash
pip install meta-oxide
```

### For Rust Users

Add to your `Cargo.toml`:

```toml
[dependencies]
meta_oxide = "0.1.0"
```

## Prerequisites

### Python Development

- Python 3.8 or higher
- pip (Python package manager)

### Rust Development

- Rust 1.70 or higher
- Cargo (Rust package manager)

### Building Python Bindings

For building the Python package from source:

- Python 3.8+
- Rust toolchain (install via [rustup]https://rustup.rs/)
- maturin (install via `pip install maturin`)

## Building from Source

### Clone the Repository

```bash
git clone https://github.com/yourusername/meta_oxide.git
cd meta_oxide
```

### Build Python Package

Using maturin for development:

```bash
# Install maturin if you haven't already
pip install maturin

# Build and install in development mode
maturin develop

# Or build a wheel
maturin build --release
```

### Build Rust Library

```bash
# Build the library
cargo build --release

# Run tests
cargo test

# Build documentation
cargo doc --open
```

## Basic Usage

### Python

```python
import meta_oxide

# Extract ALL metadata at once (recommended!)
html = """
<html>
<head>
    <title>My Article</title>
    <meta property="og:title" content="My Article">
    <meta name="twitter:card" content="summary_large_image">
    <script type="application/ld+json">
    {
        "@context": "https://schema.org",
        "@type": "Article",
        "headline": "My Article"
    }
    </script>
</head>
<body></body>
</html>
"""

# Extract everything
data = meta_oxide.extract_all(html)

print(data['meta']['title'])           # Standard meta tags
print(data['opengraph']['title'])      # Open Graph
print(data['twitter']['card'])         # Twitter Cards
print(data['jsonld'][0]['headline'])   # JSON-LD

# Or extract specific formats
meta = meta_oxide.extract_meta(html)
og = meta_oxide.extract_opengraph(html)
twitter = meta_oxide.extract_twitter(html)
jsonld = meta_oxide.extract_jsonld(html)

# Extract microformats
html_mf = """
<div class="h-card">
    <span class="p-name">Jane Doe</span>
    <a class="u-url" href="https://example.com">Website</a>
    <a class="u-email" href="mailto:jane@example.com">Email</a>
</div>
"""

# Extract all microformats at once
result = meta_oxide.extract_microformats(html)
print(result)

# Extract specific microformat types
cards = meta_oxide.extract_hcard(html)
print(f"Found {len(cards)} h-cards")

for card in cards:
    print(f"Name: {card.get('name')}")
    print(f"URL: {card.get('url')}")
    print(f"Email: {card.get('email')}")
```

### Rust

```rust
use meta_oxide::extractors::{extract_hcard, extract_hentry, extract_hevent};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = r#"
        <div class="h-card">
            <span class="p-name">Jane Doe</span>
            <a class="u-url" href="https://example.com">Website</a>
        </div>
    "#;

    let cards = extract_hcard(html, None)?;

    for card in cards {
        println!("Name: {:?}", card.name);
        println!("URL: {:?}", card.url);
    }

    Ok(())
}
```

## Supported Microformats

MetaOxide currently supports the following microformat types:

- **h-card**: Personal or organizational contact information
- **h-entry**: Blog posts, articles, and other content entries
- **h-event**: Events with dates, times, and locations
- More formats coming soon!

## Next Steps

- Read the [API Reference]api-reference.md for detailed function documentation
- Check out [Examples]examples.md for more usage patterns
- Learn about the [Architecture]architecture.md to understand how it works
- See [Development Guide]development.md to contribute to the project

## Common Issues

### Python Module Not Found

If you get `ModuleNotFoundError: No module named 'meta_oxide'`:

1. Make sure you've built and installed the package: `maturin develop`
2. Verify you're using the correct Python environment
3. Check that the build completed successfully

### Rust Compilation Errors

If you encounter compilation errors:

1. Update Rust: `rustup update`
2. Clean the build: `cargo clean`
3. Rebuild: `cargo build --release`

## Getting Help

- Check the [examples]examples.md for common use cases
- Read the [API documentation]api-reference.md
- Open an issue on GitHub
- Join our community discussions

## License

MetaOxide is dual-licensed under MIT or Apache-2.0.