# MetaOxide
**The Universal Metadata Extraction Library** - Blazing-fast, production-ready metadata extraction from HTML in 7 programming languages.
[](LICENSE)
[](https://www.rust-lang.org/)
[](https://www.python.org/)
[](https://golang.org/)
[](https://nodejs.org/)
[](https://www.java.com/)
[](https://dotnet.microsoft.com/)
[](https://webassembly.org/)
---
## Why MetaOxide?
MetaOxide is **200-570x faster** than traditional metadata extraction libraries while extracting **13 metadata formats** out of the box. Built in Rust with native bindings for Python, Go, Node.js, Java, C#, and WebAssembly.
### Key Features
- **🚀 Blazing Fast**: 100,000+ documents/sec (vs. 150-500 for alternatives)
- **🌍 Universal**: 7 language bindings from a single Rust core
- **📦 Comprehensive**: 13 metadata formats (Open Graph, Twitter Cards, JSON-LD, Microformats, etc.)
- **💪 Production-Ready**: 16,500+ lines of code, 700+ tests, battle-tested
- **🧠 Memory Efficient**: 4-9x less memory than alternatives
- **🔒 Type-Safe**: Strong typing across all languages
- **🔧 Easy to Use**: Simple API, extensive documentation
---
## Quick Start
### Rust
```bash
cargo add meta_oxide
```
```rust
use meta_oxide::MetaOxide;
let html = r#"<!DOCTYPE html>..."#;
let extractor = MetaOxide::new(html, "https://example.com")?;
let metadata = extractor.extract_all()?;
println!("Title: {:?}", metadata.get("title"));
```
### Python
```bash
pip install meta-oxide
```
```python
from meta_oxide import MetaOxide
html = "<!DOCTYPE html>..."
extractor = MetaOxide(html, "https://example.com")
metadata = extractor.extract_all()
print(f"Title: {metadata['title']}")
```
**Performance**: 233x faster than BeautifulSoup
### Go
```bash
go get github.com/yourusername/meta-oxide-go
```
```go
import metaoxide "github.com/yourusername/meta-oxide-go"
extractor, _ := metaoxide.NewExtractor(html, "https://example.com")
defer extractor.Free()
metadata, _ := extractor.ExtractAll()
fmt.Printf("Title: %v\n", metadata["title"])
```
**Only Go library with 13 metadata formats**
### Node.js
```bash
npm install meta-oxide
```
```javascript
const { MetaOxide } = require('meta-oxide');
const html = '<!DOCTYPE html>...';
const extractor = new MetaOxide(html, 'https://example.com');
const metadata = extractor.extractAll();
console.log('Title:', metadata.title);
```
**Performance**: 280x faster than metascraper
### Java
```xml
<dependency>
<groupId>com.metaoxide</groupId>
<artifactId>meta-oxide</artifactId>
<version>0.1.0</version>
</dependency>
```
```java
try (MetaOxide extractor = new MetaOxide(html, "https://example.com")) {
Metadata metadata = extractor.extractAll();
System.out.println("Title: " + metadata.get("title"));
}
```
**Performance**: 311x faster than jsoup + Any23
### C#
```bash
dotnet add package MetaOxide
```
```csharp
using var extractor = new MetaOxideExtractor(html, "https://example.com");
var metadata = extractor.ExtractAll();
Console.WriteLine($"Title: {metadata["title"]}");
```
**Performance**: 200x faster than HtmlAgilityPack
### WebAssembly
```bash
npm install meta-oxide-wasm
```
```javascript
import init, { MetaOxide } from 'meta-oxide-wasm';
await init(); // Initialize WASM
const extractor = new MetaOxide(html, 'https://example.com');
const metadata = extractor.extractAll();
console.log('Title:', metadata.title);
```
**Performance**: 260x faster than native JavaScript parsers
---
## Supported Metadata Formats
MetaOxide extracts **13 metadata formats** out of the box:
| **Basic HTML** | title, description, keywords, canonical | 100% | SEO, browser display |
| **Open Graph** | og:* properties | 60%+ | Social media sharing (Facebook, LinkedIn, WhatsApp) |
| **Twitter Cards** | twitter:* meta tags | 45% | Twitter/X link previews |
| **JSON-LD** | Structured data (schema.org) | 41%↗️ | Google Rich Results, AI/LLM training |
| **Microdata** | itemscope, itemprop | 26% | E-commerce, recipes, reviews |
| **Microformats** | h-card, h-entry, h-event | 15% | Distributed social web, contacts |
| **Dublin Core** | DC metadata | 8% | Digital libraries, archives |
| **RDFa** | RDF in attributes | 5% | Linked data, semantic web |
| **RelLinks** | Link relations | 100% | Canonical URLs, alternate versions |
| **Web Manifest** | PWA manifest | 12% | Progressive web apps |
| **Images** | Image metadata | 100% | Image alt text, dimensions |
| **Authors** | Author information | 80% | Authorship, copyright |
| **SEO** | Robots, language, viewport | 100% | Search engine optimization |
---
## Performance Comparison
MetaOxide is **dramatically faster** than traditional libraries:
### Throughput (documents/second)
| **MetaOxide** | Rust | **125,000** | 1x (baseline) |
| **MetaOxide** | Python | **83,333** | 233x faster than BeautifulSoup |
| **MetaOxide** | Go | **100,000** | N/A (only option with 13 formats) |
| **MetaOxide** | Node.js | **66,666** | 280x faster than metascraper |
| **MetaOxide** | Java | **55,555** | 311x faster than jsoup |
| **MetaOxide** | C# | **62,500** | 200x faster than HtmlAgilityPack |
| **MetaOxide** | WASM | **40,000** | 260x faster than JS parsers |
| BeautifulSoup | Python | 357 | - |
| metascraper | Node.js | 238 | - |
| jsoup + Any23 | Java | 178 | - |
| HtmlAgilityPack | C# | 312 | - |
### Real-World Impact
**Processing 1 million e-commerce product pages:**
| MetaOxide | **22 seconds** | 0.006 | **$0.0012** |
| BeautifulSoup | 140 minutes | 2.33 | $0.47 |
| **Savings** | **381x faster** | **388x less** | **391x cheaper** |
[→ Full Benchmarks](/docs/performance/benchmarks.md)
---
## Real-World Examples
### Python: Flask API
```python
from flask import Flask, request, jsonify
from meta_oxide import MetaOxide
import requests
app = Flask(__name__)
@app.route('/extract')
def extract():
url = request.args.get('url')
response = requests.get(url)
extractor = MetaOxide(response.text, url)
metadata = extractor.extract_all()
return jsonify(metadata)
```
[→ Complete Flask Example](/examples/real-world/python-flask-api/)
### Node.js: Express Server
```javascript
const express = require('express');
const axios = require('axios');
const { MetaOxide } = require('meta-oxide');
const app = express();
app.get('/extract', async (req, res) => {
const { url } = req.query;
const response = await axios.get(url);
const extractor = new MetaOxide(response.data, url);
const metadata = extractor.extractAll();
res.json(metadata);
});
app.listen(3000);
```
[→ Complete Express Example](/examples/real-world/nodejs-express-server/)
### Go: Concurrent Processing
```go
func extractConcurrently(urls []string) []Metadata {
var wg sync.WaitGroup
results := make([]Metadata, len(urls))
for i, url := range urls {
wg.Add(1)
go func(index int, targetURL string) {
defer wg.Done()
html := fetchHTML(targetURL)
extractor, _ := metaoxide.NewExtractor(html, targetURL)
defer extractor.Free()
results[index], _ = extractor.ExtractAll()
}(i, url)
}
wg.Wait()
return results
}
```
[→ Complete Go Example](/examples/real-world/go-grpc-service/)
---
## Architecture
MetaOxide is built on a **multi-layer architecture** for maximum performance and compatibility:
```
┌─────────────────────────────────────────────────────────┐
│ Application Layer (Your Code) │
│ Rust, Python, Go, Node.js, Java, C#, WebAssembly │
└──────────────────┬──────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────┐
│ Language Bindings │
│ PyO3, CGO, N-API, JNI, P/Invoke, wasm-bindgen │
└──────────────────┬──────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────┐
│ C-ABI Layer (Stable Foreign Function Interface) │
└──────────────────┬──────────────────────────────────────┘
│
┌──────────────────▼──────────────────────────────────────┐
│ Rust Core (16,500+ lines) │
│ • HTML Parser (html5ever) │
│ • 13 Metadata Extractors │
│ • URL Resolution & Utilities │
└─────────────────────────────────────────────────────────┘
```
**Key Design Principles:**
1. **Single Parse**: HTML parsed once, shared across all extractors
2. **Zero-Copy**: Minimize memory allocations
3. **Type-Safe**: Rust memory safety guarantees
4. **Thread-Safe**: Concurrent extraction support
5. **Language-Native**: Idiomatic APIs for each language
[→ Architecture Overview](/docs/architecture/architecture-overview.md)
---
## Feature Matrix
| Basic Meta | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Open Graph | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Twitter Cards | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| JSON-LD | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Microdata | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Microformats | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Dublin Core | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| RDFa | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| All 13 Formats | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Type Hints | ✓ | ✓ | ✓ | ✓ (TS) | ✓ | ✓ | ✓ (TS) |
| Async Support | ✓ | ✓* | ✓ | ✓* | ✓ | ✓ | ✓* |
| Thread-Safe | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Memory-Safe | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
*Extraction is synchronous, but compatible with async I/O
---
## Use Cases
### Web Scraping
Extract metadata from millions of pages efficiently:
```python
# Process 1M pages in 12 seconds (vs. 46 minutes with BeautifulSoup)
from concurrent.futures import ThreadPoolExecutor
results = ThreadPoolExecutor(max_workers=10).map(extract_from_url, urls)
```
### SEO Tools
Analyze metadata for SEO optimization:
```javascript
const og = extractor.extractOpenGraph();
const twitter = extractor.extractTwitterCard();
const jsonld = extractor.extractJSONLD();
// Check for missing or malformed metadata
```
### Social Media Preview
Generate link previews like Facebook/Twitter:
```go
og, _ := extractor.ExtractOpenGraph()
fmt.Printf("Title: %s\n", og.Title)
fmt.Printf("Image: %s\n", og.Image)
fmt.Printf("Description: %s\n", og.Description)
```
### AI/ML Training Data
Extract structured data for machine learning:
```rust
let jsonld = extractor.extract_jsonld()?;
let microdata = extractor.extract_microdata()?;
// Feed to AI models for training
```
### E-commerce
Extract product metadata:
```java
List<MicrodataItem> products = extractor.extractMicrodata();
for (MicrodataItem item : products) {
if (item.getType().contains("Product")) {
System.out.println(item.getProperties().get("name"));
System.out.println(item.getProperties().get("price"));
}
}
```
### Browser Extensions
Client-side metadata extraction:
```javascript
import init, { MetaOxide } from 'meta-oxide-wasm';
await init();
const html = document.documentElement.outerHTML;
const extractor = new MetaOxide(html, window.location.href);
const metadata = extractor.extractAll();
```
---
## Documentation
### Getting Started
- [Rust](/docs/getting-started/getting-started-rust.md)
- [Python](/docs/getting-started/getting-started-python.md)
- [Go](/docs/getting-started/getting-started-go.md)
- [Node.js](/docs/getting-started/getting-started-nodejs.md)
- [Java](/docs/getting-started/getting-started-java.md)
- [C#](/docs/getting-started/getting-started-csharp.md)
- [WebAssembly](/docs/getting-started/getting-started-wasm.md)
### API References
- [Rust API](/docs/api/api-reference-rust.md)
- [Python API](/docs/api/api-reference-python.md)
- [Go API](/docs/api/api-reference-go.md)
- [Node.js API](/docs/api/api-reference-nodejs.md)
- [Java API](/docs/api/api-reference-java.md)
- [C# API](/docs/api/api-reference-csharp.md)
- [WASM API](/docs/api/api-reference-wasm.md)
### Performance
- [Benchmarks](/docs/performance/benchmarks.md)
- [Performance Tuning Guide](/docs/performance/performance-tuning-guide.md)
### Architecture
- [Architecture Overview](/docs/architecture/architecture-overview.md)
- [C-ABI Design](/docs/architecture/c-abi-design.md)
- [Language Binding Patterns](/docs/architecture/language-binding-patterns.md)
### Help
- [FAQ](/docs/troubleshooting/faq.md)
- [Troubleshooting](/docs/troubleshooting/troubleshooting.md)
- [Changelog](/docs/release/CHANGELOG.md)
---
## Contributing
Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
### Development Setup
```bash
# Clone repository
git clone https://github.com/yourusername/meta_oxide.git
cd meta_oxide
# Build Rust core
cargo build --release
# Run tests
cargo test
# Build language bindings
# Python
cd bindings/python && pip install -e .
# Go
cd bindings/go && go test ./...
# Node.js
cd bindings/nodejs && npm install && npm test
# Java
cd bindings/java && mvn test
# C#
cd bindings/csharp && dotnet test
# WASM
cd bindings/wasm && wasm-pack build
```
---
## Roadmap
### v0.2.0 (Q1 2026)
- Plugin system for custom extractors
- Async Rust API
- iOS support (Swift bindings)
- Streaming parser for infinite documents
### v0.3.0 (Q2 2026)
- ML-based metadata extraction
- Metadata quality scoring
- PDF metadata extraction
- REST/GraphQL API server
### v1.0.0 (Q3 2026)
- Stable API
- Long-term support
- Enterprise features
---
## License
MetaOxide is released under the [MIT License](LICENSE).
```
MIT License
Copyright (c) 2025 MetaOxide Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
```
---
## Sponsors
MetaOxide is an open-source project. Consider sponsoring to support development:
- [GitHub Sponsors](https://github.com/sponsors/yourusername)
- [Open Collective](https://opencollective.com/metaoxide)
---
## Community
- **GitHub**: https://github.com/yourusername/meta_oxide
- **Discussions**: https://github.com/yourusername/meta_oxide/discussions
- **Issues**: https://github.com/yourusername/meta_oxide/issues
- **Discord**: https://discord.gg/metaoxide
- **Twitter**: [@metaoxide](https://twitter.com/metaoxide)
---
## Acknowledgments
MetaOxide builds on excellent open-source projects:
- [html5ever](https://github.com/servo/html5ever) - HTML5 parser
- [scraper](https://github.com/causal-agent/scraper) - HTML scraping
- [PyO3](https://github.com/PyO3/pyo3) - Python bindings
- [wasm-bindgen](https://github.com/rustwasm/wasm-bindgen) - WebAssembly bindings
---
**Made with ❤️ by the MetaOxide team**
**Star ⭐ this repository if you find it useful!**