# MetaOxide for .NET
Fast, comprehensive metadata extraction library for .NET applications. Built on a high-performance Rust core with zero-copy parsing.
[](https://www.nuget.org/packages/MetaOxide/)
[](https://github.com/yfedoseev/meta-oxide-csharp)
[](https://dotnet.microsoft.com/)
## Features
- **13 Metadata Formats**: Extract all major metadata standards in one library
- **High Performance**: Rust core with ~60µs per extraction (10-50x faster than alternatives)
- **Cross-Platform**: Windows, Linux, macOS (x64 and ARM64)
- **Zero Dependencies**: Native binaries included, no external dependencies
- **Type-Safe API**: Strongly-typed C# API with full IntelliSense support
- **Thread-Safe**: Concurrent extraction with no global state
- **Easy Integration**: Simple NuGet install, works in .NET Framework, .NET Core, and .NET 5+
### Supported Metadata Formats
1. **Standard HTML meta tags** (`<meta name content>`)
2. **Open Graph** (`og:*` properties) - Facebook, LinkedIn
3. **Twitter Cards** (`twitter:*` properties)
4. **JSON-LD** (Schema.org structured data)
5. **Microdata** (`itemscope`, `itemprop`)
6. **Microformats** (h-card, h-entry, h-event, h-product, etc.)
7. **RDFa** (Resource Description Framework)
8. **Dublin Core** (`DC.*` metadata)
9. **Web App Manifest** (discovery)
10. **oEmbed** (endpoint discovery)
11. **rel-* Links** (canonical, alternate, prev, next, etc.)
## Installation
### NuGet Package Manager
```powershell
Install-Package MetaOxide
```
### .NET CLI
```bash
dotnet add package MetaOxide
```
### Package Manager Console
```powershell
PM> Install-Package MetaOxide
```
### Requirements
- **.NET Framework 4.6.2+** or **.NET Standard 2.0+** or **.NET 5+**
- **Platforms**: Windows (x64), Linux (x64, ARM64), macOS (x64, ARM64)
## Quick Start
### Extract All Metadata
```csharp
using MetaOxide;
var html = @"
<!DOCTYPE html>
<html>
<head>
<title>Example Page</title>
<meta name=""description"" content=""Page description"">
<meta property=""og:title"" content=""OG Title"">
<meta name=""twitter:card"" content=""summary"">
</head>
</html>";
// Extract all metadata at once
var result = Extractor.ExtractAll(html, baseUrl: "https://example.com");
// Access metadata by format
Console.WriteLine($"Found {result.GetMetadataFormatCount()} metadata formats");
if (result.Meta != null)
{
Console.WriteLine($"Description: {result.Meta["description"]}");
}
if (result.OpenGraph != null)
{
Console.WriteLine($"OG Title: {result.OpenGraph["title"]}");
}
```
### Extract from URL
```csharp
using System.Net.Http;
using MetaOxide;
var httpClient = new HttpClient();
var html = await httpClient.GetStringAsync("https://example.com");
var result = Extractor.ExtractAll(html, "https://example.com");
// Export to JSON
var json = result.ToJson();
Console.WriteLine(json);
```
### Extract Specific Formats
```csharp
using MetaOxide;
// Extract only Open Graph
var openGraph = Extractor.ExtractOpenGraph(html);
// Extract only Twitter Cards
var twitter = Extractor.ExtractTwitter(html);
// Extract only JSON-LD
var jsonLd = Extractor.ExtractJsonLd(html);
// Extract only Microformats
var microformats = Extractor.ExtractMicroformats(html);
```
## API Reference
### Extractor Class
Main static class for all extraction operations.
#### ExtractAll
```csharp
public static ExtractionResult ExtractAll(string html, string? baseUrl = null)
```
Extracts all 13 metadata formats in a single operation.
**Parameters:**
- `html` (string): The HTML content to parse (required)
- `baseUrl` (string?): Optional base URL for resolving relative URLs
**Returns:** `ExtractionResult` containing all extracted metadata
**Throws:**
- `ArgumentNullException`: When html is null or empty
- `MetaOxideException`: When extraction fails
#### Individual Extraction Methods
```csharp
// Standard meta tags
Dictionary<string, object>? ExtractMeta(string html, string? baseUrl = null)
// Open Graph metadata
Dictionary<string, object>? ExtractOpenGraph(string html, string? baseUrl = null)
// Twitter Card metadata
Dictionary<string, object>? ExtractTwitter(string html, string? baseUrl = null)
// JSON-LD structured data
List<object>? ExtractJsonLd(string html, string? baseUrl = null)
// Microdata items
List<object>? ExtractMicrodata(string html, string? baseUrl = null)
// Microformats data
Dictionary<string, object>? ExtractMicroformats(string html, string? baseUrl = null)
// RDFa structured data
List<object>? ExtractRDFa(string html, string? baseUrl = null)
// Dublin Core metadata
Dictionary<string, object>? ExtractDublinCore(string html, string? baseUrl = null)
// Web App Manifest discovery
Dictionary<string, object>? ExtractManifest(string html, string? baseUrl = null)
// oEmbed endpoint discovery
Dictionary<string, object>? ExtractOEmbed(string html, string? baseUrl = null)
// rel-* link relationships
Dictionary<string, object>? ExtractRelLinks(string html, string? baseUrl = null)
```
#### JSON Extraction Methods
For each format, there's also a JSON extraction method that returns raw JSON strings:
```csharp
string? ExtractMetaJson(string html, string? baseUrl = null)
string? ExtractOpenGraphJson(string html, string? baseUrl = null)
string? ExtractTwitterJson(string html, string? baseUrl = null)
// ... etc for all formats
```
### ExtractionResult Class
Contains all extracted metadata.
#### Properties
```csharp
public Dictionary<string, object>? Meta { get; set; }
public Dictionary<string, object>? OpenGraph { get; set; }
public Dictionary<string, object>? Twitter { get; set; }
public List<object>? JsonLd { get; set; }
public List<object>? Microdata { get; set; }
public Dictionary<string, object>? Microformats { get; set; }
public List<object>? RDFa { get; set; }
public Dictionary<string, object>? DublinCore { get; set; }
public Dictionary<string, object>? Manifest { get; set; }
public Dictionary<string, object>? OEmbed { get; set; }
public Dictionary<string, object>? RelLinks { get; set; }
```
#### Methods
```csharp
// Check if any metadata was found
bool HasAnyMetadata()
// Get count of formats with data
int GetMetadataFormatCount()
// Convert to JSON string
string ToJson(Formatting formatting = Formatting.Indented)
// Create from JSON string
static ExtractionResult FromJson(string json)
```
### MetaOxideException Class
Exception thrown when extraction fails.
#### Properties
```csharp
public int ErrorCode { get; }
public string ErrorDescription { get; }
```
#### Methods
```csharp
// Get user-friendly error message
string GetFriendlyMessage()
```
#### Error Codes
- `0`: No error
- `1`: Parse error (malformed HTML)
- `2`: Invalid URL format
- `3`: Invalid UTF-8 encoding
- `4`: Memory allocation error
- `5`: JSON serialization error
- `6`: Null pointer (internal error)
## Examples
### ASP.NET Core Web API
```csharp
using Microsoft.AspNetCore.Mvc;
using MetaOxide;
[ApiController]
[Route("api/metadata")]
public class MetadataController : ControllerBase
{
private readonly IHttpClientFactory _httpClientFactory;
public MetadataController(IHttpClientFactory httpClientFactory)
{
_httpClientFactory = httpClientFactory;
}
[HttpGet]
public async Task<ActionResult<ExtractionResult>> ExtractMetadata(
[FromQuery] string url)
{
try
{
var client = _httpClientFactory.CreateClient();
client.DefaultRequestHeaders.UserAgent.ParseAdd("MetaOxide/1.0");
var html = await client.GetStringAsync(url);
var result = Extractor.ExtractAll(html, url);
return Ok(result);
}
catch (HttpRequestException ex)
{
return BadRequest(new { error = "Failed to fetch URL", details = ex.Message });
}
catch (MetaOxideException ex)
{
return BadRequest(new { error = ex.GetFriendlyMessage() });
}
}
}
```
### Console Application
```csharp
using System;
using MetaOxide;
class Program
{
static void Main(string[] args)
{
if (args.Length == 0)
{
Console.WriteLine("Usage: program <html-file>");
return;
}
var html = File.ReadAllText(args[0]);
var result = Extractor.ExtractAll(html);
Console.WriteLine($"Formats found: {result.GetMetadataFormatCount()}");
Console.WriteLine(result.ToJson());
}
}
```
### Parallel Processing
```csharp
using System.Threading.Tasks;
using MetaOxide;
var urls = new[] {
"https://example.com/page1",
"https://example.com/page2",
"https://example.com/page3"
};
var tasks = urls.Select(async url =>
{
using var client = new HttpClient();
var html = await client.GetStringAsync(url);
return Extractor.ExtractAll(html, url);
});
var results = await Task.WhenAll(tasks);
foreach (var result in results)
{
Console.WriteLine($"Found {result.GetMetadataFormatCount()} formats");
}
```
## Performance
MetaOxide is built on a high-performance Rust core with zero-copy HTML parsing:
| Meta Tags | 45µs | 22,000 ops/sec |
| Open Graph | 50µs | 20,000 ops/sec |
| Twitter Cards | 50µs | 20,000 ops/sec |
| JSON-LD | 120µs | 8,300 ops/sec |
| All Formats | 280µs | 3,500 ops/sec |
**Comparison with alternatives:**
- 10-50x faster than HTML parsing libraries
- Zero memory allocations in extraction path
- Concurrent extraction scales linearly with cores
## Platform Support
### Supported Platforms
| Windows | x64 | ✅ Fully supported |
| Linux | x64 | ✅ Fully supported |
| Linux | ARM64 | ✅ Fully supported |
| macOS | x64 (Intel) | ✅ Fully supported |
| macOS | ARM64 (Apple Silicon) | ✅ Fully supported |
### .NET Framework Versions
| .NET Framework | 4.6.2+ | ✅ Fully supported |
| .NET Standard | 2.0+ | ✅ Fully supported |
| .NET Core | 3.1+ | ✅ Fully supported |
| .NET | 5, 6, 7, 8+ | ✅ Fully supported |
## Thread Safety
All extraction methods are **fully thread-safe** and can be called concurrently from multiple threads. The native library uses thread-local error state and has no shared mutable state.
```csharp
// Safe to call from multiple threads
Parallel.For(0, 100, i =>
{
var result = Extractor.ExtractAll(htmlDocuments[i]);
ProcessResult(result);
});
```
## Error Handling
```csharp
try
{
var result = Extractor.ExtractAll(html);
// Process result
}
catch (ArgumentNullException ex)
{
// HTML was null or empty
Console.WriteLine($"Invalid input: {ex.Message}");
}
catch (MetaOxideException ex)
{
// Extraction failed
Console.WriteLine($"Error {ex.ErrorCode}: {ex.GetFriendlyMessage()}");
Console.WriteLine($"Details: {ex.ErrorDescription}");
}
```
## Troubleshooting
### DllNotFoundException
If you get a `DllNotFoundException`, ensure:
1. The NuGet package is properly installed
2. Native libraries are in the output directory
3. Your platform is supported (see Platform Support)
Run diagnostic info:
```csharp
var info = Extractor.GetDiagnosticInfo();
Console.WriteLine(info);
```
### Memory Issues
For very large documents (>10MB), consider:
1. Processing in chunks
2. Extracting only needed formats
3. Using streaming parsers for initial HTML cleanup
## Building from Source
```bash
# Clone the repository
git clone https://github.com/yfedoseev/meta-oxide-csharp.git
cd meta-oxide-csharp
# Build the Rust core
cd ../meta_oxide
cargo build --release --features c-api
# Copy native library
cp target/release/libmeta_oxide.* ../meta-oxide-csharp/MetaOxide/native/
# Build C# library
cd ../meta-oxide-csharp
dotnet build
# Run tests
dotnet test
# Create NuGet package
dotnet pack -c Release
```
## Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch
3. Add tests for new functionality
4. Ensure all tests pass
5. Submit a pull request
## License
Dual licensed under MIT OR Apache-2.0.
## Links
- **GitHub**: https://github.com/yfedoseev/meta-oxide-csharp
- **NuGet**: https://www.nuget.org/packages/MetaOxide/
- **Documentation**: https://github.com/yfedoseev/meta-oxide-csharp/wiki
- **Issues**: https://github.com/yfedoseev/meta-oxide-csharp/issues
## See Also
- **Rust Core**: [meta_oxide](https://crates.io/crates/meta_oxide)
- **Python Bindings**: [meta-oxide-py](https://pypi.org/project/meta-oxide/)
- **Go Bindings**: [meta-oxide-go](https://github.com/yfedoseev/meta-oxide-go)
- **Node.js Bindings**: [@yfedoseev/meta-oxide](https://www.npmjs.com/package/@yfedoseev/meta-oxide)
- **Java Bindings**: [com.metaoxide:meta-oxide](https://central.sonatype.com/)
---
**Made with ❤️ using Rust and C#**