convert_with_metadata

Function convert_with_metadata 

Source
pub fn convert_with_metadata(
    html: &str,
    options: Option<ConversionOptions>,
    metadata_cfg: MetadataConfig,
) -> Result<(String, ExtendedMetadata)>
Expand description

Convert HTML to Markdown with comprehensive metadata extraction (requires the metadata feature).

Performs HTML-to-Markdown conversion while simultaneously extracting structured metadata in a single pass for maximum efficiency. Ideal for content analysis, SEO optimization, and document indexing workflows.

§Arguments

  • html - The HTML string to convert. Will normalize line endings (CRLF → LF).
  • options - Optional conversion configuration. Defaults to ConversionOptions::default() if None. Controls heading style, list indentation, escape behavior, wrapping, and other output formatting.
  • metadata_cfg - Configuration for metadata extraction granularity. Use MetadataConfig::default() to extract all metadata types, or customize with selective extraction flags.

§Returns

On success, returns a tuple of:

  • String: The converted Markdown output
  • ExtendedMetadata: Comprehensive metadata containing:
    • document: Title, description, author, language, Open Graph, Twitter Card, and other meta tags
    • headers: All heading elements (h1-h6) with hierarchy and IDs
    • links: Hyperlinks classified as anchor, internal, external, email, or phone
    • images: Image elements with source, dimensions, and alt text
    • structured_data: JSON-LD, Microdata, and RDFa blocks

§Errors

Returns ConversionError if:

  • HTML parsing fails
  • Invalid UTF-8 sequences encountered
  • Internal panic during conversion (wrapped in ConversionError::Panic)
  • Configuration size limits exceeded

§Performance Notes

  • Single-pass collection: metadata extraction has minimal overhead
  • Zero cost when metadata feature is disabled
  • Pre-allocated buffers: typically handles 50+ headers, 100+ links, 20+ images efficiently
  • Structured data size-limited to prevent memory exhaustion (configurable)

§Example: Basic Usage

use html_to_markdown_rs::{convert_with_metadata, MetadataConfig};

let html = r#"
  <html lang="en">
    <head><title>My Article</title></head>
    <body>
      <h1 id="intro">Introduction</h1>
      <p>Welcome to <a href="https://example.com">our site</a></p>
    </body>
  </html>
"#;

let (markdown, metadata) = convert_with_metadata(html, None, MetadataConfig::default())?;

assert_eq!(metadata.document.title, Some("My Article".to_string()));
assert_eq!(metadata.document.language, Some("en".to_string()));
assert_eq!(metadata.headers[0].text, "Introduction");
assert_eq!(metadata.headers[0].id, Some("intro".to_string()));
assert_eq!(metadata.links.len(), 1);

§Example: Selective Metadata Extraction

use html_to_markdown_rs::{convert_with_metadata, MetadataConfig};

let html = "<html><body><h1>Title</h1><a href='#anchor'>Link</a></body></html>";

// Extract only headers and document metadata, skip links/images
let config = MetadataConfig {
    extract_headers: true,
    extract_links: false,
    extract_images: false,
    extract_structured_data: false,
    max_structured_data_size: 0,
};

let (markdown, metadata) = convert_with_metadata(html, None, config)?;
assert!(metadata.headers.len() > 0);
assert!(metadata.links.is_empty());  // Not extracted

§Example: With Conversion Options and Metadata Config

use html_to_markdown_rs::{convert_with_metadata, ConversionOptions, MetadataConfig, HeadingStyle};

let html = "<html><head><title>Blog Post</title></head><body><h1>Hello</h1></body></html>";

let options = ConversionOptions {
    heading_style: HeadingStyle::Atx,
    wrap: true,
    wrap_width: 80,
    ..Default::default()
};

let metadata_cfg = MetadataConfig::default();

let (markdown, metadata) = convert_with_metadata(html, Some(options), metadata_cfg)?;
// Markdown will use ATX-style headings (# H1, ## H2, etc.)
// Wrapped at 80 characters
// All metadata extracted

§See Also

  • convert - Simple HTML to Markdown conversion without metadata
  • [convert_with_inline_images] - Conversion with inline image extraction
  • MetadataConfig - Configuration for metadata extraction
  • ExtendedMetadata - Metadata structure documentation
  • metadata module - Detailed type documentation for metadata components