meta_oxide 0.1.0

# MetaOxide for .NET

Fast, comprehensive metadata extraction library for .NET applications. Built on a high-performance Rust core with zero-copy parsing.

[![NuGet](https://img.shields.io/nuget/v/MetaOxide.svg)](https://www.nuget.org/packages/MetaOxide/)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](https://github.com/yfedoseev/meta-oxide-csharp)
[![.NET](https://img.shields.io/badge/.NET-Framework%204.6.2%2B%20%7C%20Standard%202.0%2B%20%7C%20.NET%205%2B-purple)](https://dotnet.microsoft.com/)

## Features

- **13 Metadata Formats**: Extract all major metadata standards in one library
- **High Performance**: Rust core with ~60µs per extraction (10-50x faster than alternatives)
- **Cross-Platform**: Windows, Linux, macOS (x64 and ARM64)
- **Zero Dependencies**: Native binaries included, no external dependencies
- **Type-Safe API**: Strongly-typed C# API with full IntelliSense support
- **Thread-Safe**: Concurrent extraction with no global state
- **Easy Integration**: Simple NuGet install, works in .NET Framework, .NET Core, and .NET 5+

### Supported Metadata Formats

1. **Standard HTML meta tags** (`<meta name content>`)
2. **Open Graph** (`og:*` properties) - Facebook, LinkedIn
3. **Twitter Cards** (`twitter:*` properties)
4. **JSON-LD** (Schema.org structured data)
5. **Microdata** (`itemscope`, `itemprop`)
6. **Microformats** (h-card, h-entry, h-event, h-product, etc.)
7. **RDFa** (Resource Description Framework)
8. **Dublin Core** (`DC.*` metadata)
9. **Web App Manifest** (discovery)
10. **oEmbed** (endpoint discovery)
11. **rel-* Links** (canonical, alternate, prev, next, etc.)

## Installation

### NuGet Package Manager

```powershell
Install-Package MetaOxide
```

### .NET CLI

```bash
dotnet add package MetaOxide
```

### Package Manager Console

```powershell
PM> Install-Package MetaOxide
```

### Requirements

- **.NET Framework 4.6.2+** or **.NET Standard 2.0+** or **.NET 5+**
- **Platforms**: Windows (x64), Linux (x64, ARM64), macOS (x64, ARM64)

## Quick Start

### Extract All Metadata

```csharp
using MetaOxide;

var html = @"
<!DOCTYPE html>
<html>
<head>
    <title>Example Page</title>
    <meta name=""description"" content=""Page description"">
    <meta property=""og:title"" content=""OG Title"">
    <meta name=""twitter:card"" content=""summary"">
</head>
</html>";

// Extract all metadata at once
var result = Extractor.ExtractAll(html, baseUrl: "https://example.com");

// Access metadata by format
Console.WriteLine($"Found {result.GetMetadataFormatCount()} metadata formats");

if (result.Meta != null)
{
    Console.WriteLine($"Description: {result.Meta["description"]}");
}

if (result.OpenGraph != null)
{
    Console.WriteLine($"OG Title: {result.OpenGraph["title"]}");
}
```

### Extract from URL

```csharp
using System.Net.Http;
using MetaOxide;

var httpClient = new HttpClient();
var html = await httpClient.GetStringAsync("https://example.com");

var result = Extractor.ExtractAll(html, "https://example.com");

// Export to JSON
var json = result.ToJson();
Console.WriteLine(json);
```

### Extract Specific Formats

```csharp
using MetaOxide;

// Extract only Open Graph
var openGraph = Extractor.ExtractOpenGraph(html);

// Extract only Twitter Cards
var twitter = Extractor.ExtractTwitter(html);

// Extract only JSON-LD
var jsonLd = Extractor.ExtractJsonLd(html);

// Extract only Microformats
var microformats = Extractor.ExtractMicroformats(html);
```

## API Reference

### Extractor Class

Main static class for all extraction operations.

#### ExtractAll

```csharp
public static ExtractionResult ExtractAll(string html, string? baseUrl = null)
```

Extracts all 13 metadata formats in a single operation.

**Parameters:**
- `html` (string): The HTML content to parse (required)
- `baseUrl` (string?): Optional base URL for resolving relative URLs

**Returns:** `ExtractionResult` containing all extracted metadata

**Throws:**
- `ArgumentNullException`: When html is null or empty
- `MetaOxideException`: When extraction fails

#### Individual Extraction Methods

```csharp
// Standard meta tags
Dictionary<string, object>? ExtractMeta(string html, string? baseUrl = null)

// Open Graph metadata
Dictionary<string, object>? ExtractOpenGraph(string html, string? baseUrl = null)

// Twitter Card metadata
Dictionary<string, object>? ExtractTwitter(string html, string? baseUrl = null)

// JSON-LD structured data
List<object>? ExtractJsonLd(string html, string? baseUrl = null)

// Microdata items
List<object>? ExtractMicrodata(string html, string? baseUrl = null)

// Microformats data
Dictionary<string, object>? ExtractMicroformats(string html, string? baseUrl = null)

// RDFa structured data
List<object>? ExtractRDFa(string html, string? baseUrl = null)

// Dublin Core metadata
Dictionary<string, object>? ExtractDublinCore(string html, string? baseUrl = null)

// Web App Manifest discovery
Dictionary<string, object>? ExtractManifest(string html, string? baseUrl = null)

// oEmbed endpoint discovery
Dictionary<string, object>? ExtractOEmbed(string html, string? baseUrl = null)

// rel-* link relationships
Dictionary<string, object>? ExtractRelLinks(string html, string? baseUrl = null)
```

#### JSON Extraction Methods

For each format, there's also a JSON extraction method that returns raw JSON strings:

```csharp
string? ExtractMetaJson(string html, string? baseUrl = null)
string? ExtractOpenGraphJson(string html, string? baseUrl = null)
string? ExtractTwitterJson(string html, string? baseUrl = null)
// ... etc for all formats
```

### ExtractionResult Class

Contains all extracted metadata.

#### Properties

```csharp
public Dictionary<string, object>? Meta { get; set; }
public Dictionary<string, object>? OpenGraph { get; set; }
public Dictionary<string, object>? Twitter { get; set; }
public List<object>? JsonLd { get; set; }
public List<object>? Microdata { get; set; }
public Dictionary<string, object>? Microformats { get; set; }
public List<object>? RDFa { get; set; }
public Dictionary<string, object>? DublinCore { get; set; }
public Dictionary<string, object>? Manifest { get; set; }
public Dictionary<string, object>? OEmbed { get; set; }
public Dictionary<string, object>? RelLinks { get; set; }
```

#### Methods

```csharp
// Check if any metadata was found
bool HasAnyMetadata()

// Get count of formats with data
int GetMetadataFormatCount()

// Convert to JSON string
string ToJson(Formatting formatting = Formatting.Indented)

// Create from JSON string
static ExtractionResult FromJson(string json)
```

### MetaOxideException Class

Exception thrown when extraction fails.

#### Properties

```csharp
public int ErrorCode { get; }
public string ErrorDescription { get; }
```

#### Methods

```csharp
// Get user-friendly error message
string GetFriendlyMessage()
```

#### Error Codes

- `0`: No error
- `1`: Parse error (malformed HTML)
- `2`: Invalid URL format
- `3`: Invalid UTF-8 encoding
- `4`: Memory allocation error
- `5`: JSON serialization error
- `6`: Null pointer (internal error)

## Examples

### ASP.NET Core Web API

```csharp
using Microsoft.AspNetCore.Mvc;
using MetaOxide;

[ApiController]
[Route("api/metadata")]
public class MetadataController : ControllerBase
{
    private readonly IHttpClientFactory _httpClientFactory;

    public MetadataController(IHttpClientFactory httpClientFactory)
    {
        _httpClientFactory = httpClientFactory;
    }

    [HttpGet]
    public async Task<ActionResult<ExtractionResult>> ExtractMetadata(
        [FromQuery] string url)
    {
        try
        {
            var client = _httpClientFactory.CreateClient();
            client.DefaultRequestHeaders.UserAgent.ParseAdd("MetaOxide/1.0");

            var html = await client.GetStringAsync(url);
            var result = Extractor.ExtractAll(html, url);

            return Ok(result);
        }
        catch (HttpRequestException ex)
        {
            return BadRequest(new { error = "Failed to fetch URL", details = ex.Message });
        }
        catch (MetaOxideException ex)
        {
            return BadRequest(new { error = ex.GetFriendlyMessage() });
        }
    }
}
```

### Console Application

```csharp
using System;
using MetaOxide;

class Program
{
    static void Main(string[] args)
    {
        if (args.Length == 0)
        {
            Console.WriteLine("Usage: program <html-file>");
            return;
        }

        var html = File.ReadAllText(args[0]);
        var result = Extractor.ExtractAll(html);

        Console.WriteLine($"Formats found: {result.GetMetadataFormatCount()}");
        Console.WriteLine(result.ToJson());
    }
}
```

### Parallel Processing

```csharp
using System.Threading.Tasks;
using MetaOxide;

var urls = new[] {
    "https://example.com/page1",
    "https://example.com/page2",
    "https://example.com/page3"
};

var tasks = urls.Select(async url =>
{
    using var client = new HttpClient();
    var html = await client.GetStringAsync(url);
    return Extractor.ExtractAll(html, url);
});

var results = await Task.WhenAll(tasks);

foreach (var result in results)
{
    Console.WriteLine($"Found {result.GetMetadataFormatCount()} formats");
}
```

## Performance

MetaOxide is built on a high-performance Rust core with zero-copy HTML parsing:

| Format | Average Time | Throughput |
|--------|-------------|------------|
| Meta Tags | 45µs | 22,000 ops/sec |
| Open Graph | 50µs | 20,000 ops/sec |
| Twitter Cards | 50µs | 20,000 ops/sec |
| JSON-LD | 120µs | 8,300 ops/sec |
| All Formats | 280µs | 3,500 ops/sec |

**Comparison with alternatives:**
- 10-50x faster than HTML parsing libraries
- Zero memory allocations in extraction path
- Concurrent extraction scales linearly with cores

## Platform Support

### Supported Platforms

| Platform | Architecture | Status |
|----------|-------------|---------|
| Windows | x64 | ✅ Fully supported |
| Linux | x64 | ✅ Fully supported |
| Linux | ARM64 | ✅ Fully supported |
| macOS | x64 (Intel) | ✅ Fully supported |
| macOS | ARM64 (Apple Silicon) | ✅ Fully supported |

### .NET Framework Versions

| Framework | Version | Status |
|-----------|---------|---------|
| .NET Framework | 4.6.2+ | ✅ Fully supported |
| .NET Standard | 2.0+ | ✅ Fully supported |
| .NET Core | 3.1+ | ✅ Fully supported |
| .NET | 5, 6, 7, 8+ | ✅ Fully supported |

## Thread Safety

All extraction methods are **fully thread-safe** and can be called concurrently from multiple threads. The native library uses thread-local error state and has no shared mutable state.

```csharp
// Safe to call from multiple threads
Parallel.For(0, 100, i =>
{
    var result = Extractor.ExtractAll(htmlDocuments[i]);
    ProcessResult(result);
});
```

## Error Handling

```csharp
try
{
    var result = Extractor.ExtractAll(html);
    // Process result
}
catch (ArgumentNullException ex)
{
    // HTML was null or empty
    Console.WriteLine($"Invalid input: {ex.Message}");
}
catch (MetaOxideException ex)
{
    // Extraction failed
    Console.WriteLine($"Error {ex.ErrorCode}: {ex.GetFriendlyMessage()}");
    Console.WriteLine($"Details: {ex.ErrorDescription}");
}
```

## Troubleshooting

### DllNotFoundException

If you get a `DllNotFoundException`, ensure:

1. The NuGet package is properly installed
2. Native libraries are in the output directory
3. Your platform is supported (see Platform Support)

Run diagnostic info:

```csharp
var info = Extractor.GetDiagnosticInfo();
Console.WriteLine(info);
```

### Memory Issues

For very large documents (>10MB), consider:

1. Processing in chunks
2. Extracting only needed formats
3. Using streaming parsers for initial HTML cleanup

## Building from Source

```bash
# Clone the repository
git clone https://github.com/yfedoseev/meta-oxide-csharp.git
cd meta-oxide-csharp

# Build the Rust core
cd ../meta_oxide
cargo build --release --features c-api

# Copy native library
cp target/release/libmeta_oxide.* ../meta-oxide-csharp/MetaOxide/native/

# Build C# library
cd ../meta-oxide-csharp
dotnet build

# Run tests
dotnet test

# Create NuGet package
dotnet pack -c Release
```

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## License

Dual licensed under MIT OR Apache-2.0.

## Links

- **GitHub**: https://github.com/yfedoseev/meta-oxide-csharp
- **NuGet**: https://www.nuget.org/packages/MetaOxide/
- **Documentation**: https://github.com/yfedoseev/meta-oxide-csharp/wiki
- **Issues**: https://github.com/yfedoseev/meta-oxide-csharp/issues

## See Also

- **Rust Core**: [meta_oxide](https://crates.io/crates/meta_oxide)
- **Python Bindings**: [meta-oxide-py](https://pypi.org/project/meta-oxide/)
- **Go Bindings**: [meta-oxide-go](https://github.com/yfedoseev/meta-oxide-go)
- **Node.js Bindings**: [@yfedoseev/meta-oxide](https://www.npmjs.com/package/@yfedoseev/meta-oxide)
- **Java Bindings**: [com.metaoxide:meta-oxide](https://central.sonatype.com/)

---

**Made with ❤️ using Rust and C#**