webpage_quality_analyzer 1.0.0

High-performance webpage quality analyzer with 115 comprehensive metrics - Rust library with WASM, C++, and Python bindings
Documentation

Webpage Quality Analyzer

Crates.io Documentation License: MIT OR Apache-2.0

High-performance webpage quality analyzer with 115 comprehensive metrics. Analyze web pages for SEO, content quality, technical standards, accessibility, and more - all in milliseconds.

๐Ÿš€ Features

  • 115 Comprehensive Metrics across 7 categories (Content, SEO, Technical, Semantic, Accessibility, Network, Engagement)
  • 9 Pre-configured Profiles optimized for different page types (news, blog, ecommerce, etc.)
  • Multi-Platform Support: Native Rust, WebAssembly (browser/Node.js), C++ FFI, Python bindings
  • High Performance: 180+ pages/second batch processing with parallel analysis
  • Flexible Configuration: Custom profiles, metric weights, penalties, and bonuses
  • Production Ready: Battle-tested, comprehensive test suite, extensive documentation

๐Ÿ“ฆ Installation

Add this to your Cargo.toml:

[dependencies]
webpage_quality_analyzer = "1.0"

Or use cargo:

cargo add webpage_quality_analyzer

๐ŸŽฏ Quick Start

Level 1: Simple Usage

use webpage_quality_analyzer::{analyze, analyze_with_profile};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Analyze with default settings
    let report = analyze("https://example.com", None).await?;
    
    println!("Score: {}/100", report.score);
    println!("Quality: {}", report.verdict);
    println!("Word Count: {}", report.metrics.content_metrics.word_count);
    
    // Analyze with specific profile
    let news_report = analyze_with_profile(
        "https://example.com",
        None,
        "news"
    ).await?;
    
    Ok(())
}

Level 2: Builder Pattern

use webpage_quality_analyzer::Analyzer;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build custom analyzer
    let analyzer = Analyzer::builder()
        .with_profile_name("blog")?
        .with_metric_weight("word_count", 1.5)?
        .disable_metric("grammar_score")?
        .with_timeout_secs(30)?
        .build()?;
    
    let report = analyzer.run("https://example.com", None).await?;
    println!("Custom analysis score: {}", report.score);
    
    Ok(())
}

Level 3: Advanced Configuration

use webpage_quality_analyzer::Analyzer;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load from YAML config file
    let analyzer = Analyzer::from_config_file("config.yaml").await?;
    
    // Batch analysis
    let urls = vec![
        "https://site1.com",
        "https://site2.com",
        "https://site3.com",
    ];
    
    let reports = analyzer.analyze_batch_urls(urls, 5).await?;
    
    for report in reports {
        println!("{}: {}/100", report.url, report.score);
    }
    
    Ok(())
}

Analyzing HTML Directly

let html = r#"
    <!DOCTYPE html>
    <html lang="en">
    <head>
        <title>Sample Page</title>
        <meta name="description" content="A sample page">
    </head>
    <body>
        <h1>Welcome</h1>
        <p>This is a test page with some content.</p>
    </body>
    </html>
"#;

let report = analyze("https://example.com", Some(html.to_string())).await?;
println!("HTML analysis score: {}", report.score);

๐Ÿ“Š Metrics Categories

1. Content Metrics (20 metrics)

Word count, paragraph count, sentence complexity, content density, text-to-HTML ratio, content extraction quality, and more.

2. Technical Metrics (25 metrics)

Title length, meta description, heading structure, HTML validity, semantic elements, images (count, alt text, sizes), links, forms.

3. SEO Metrics (18 metrics)

Meta tags, Open Graph, Twitter Cards, canonical URLs, robots meta, schema.org structured data, sitemap links.

4. Semantic Metrics (15 metrics)

Heading hierarchy, ARIA labels, microdata, RDFa, JSON-LD, semantic HTML5 elements.

5. Accessibility Metrics (12 metrics)

ARIA attributes, roles, image alt text, form labels, color contrast, keyboard navigation.

6. Network Metrics (23 metrics)

Load time, TTFB, resource sizes (HTML, CSS, JS, images), HTTP status, redirects, compression, caching headers.

7. Engagement Metrics (2 metrics)

Interactive elements, CTAs, social sharing buttons.

๐ŸŽจ Available Profiles

Choose the right profile for your page type:

Profile Best For Key Focus
general Any webpage Balanced scoring across all metrics
news News articles Content freshness, readability, structure
blog Blog posts Content quality, engagement, readability
ecommerce Product pages Conversion elements, images, CTAs
content_article Long-form content Word count, structure, comprehensiveness
product Product landing pages Product details, images, specifications
portfolio Portfolio sites Visual content, project showcases
login_page Login/auth pages Forms, security, minimal content
homepage Homepage Navigation, structure, key messages

โš™๏ธ Feature Flags

Control optional features via Cargo features:

[dependencies]
webpage_quality_analyzer = { version = "1.0", features = ["async", "linkcheck", "nlp"] }

Available features:

  • async (default) - Async runtime with tokio + reqwest
  • readability (default) - Mozilla Readability content extraction
  • linkcheck - External link validation
  • nlp - Language detection and Unicode segmentation
  • grammar - Grammar checking (via nlprule)
  • wasm - WebAssembly bindings (mutually exclusive with async)
  • ffi - C FFI for C++ integration
  • cli - Command-line tool binary

๐ŸŒ Multi-Platform Support

WebAssembly (Browser/Node.js)

# Build for npm
wasm-pack build --target bundler --no-default-features --features wasm

# Use in JavaScript/TypeScript
npm install @webpage-quality-analyzer/core
import { WasmAnalyzer } from '@webpage-quality-analyzer/core';

const analyzer = new WasmAnalyzer();
const report = await analyzer.analyze('<html>...</html>');
console.log(`Score: ${report.score}/100`);

C++ Integration

#include "webpage_quality_analyzer.hpp"

CAnalyzer* analyzer = wqa_analyzer_new();
CReport* report = wqa_analyze(analyzer, "https://example.com", nullptr);
double score = wqa_report_get_score(report);

Command-Line Tool

# Download binary from releases
wqa analyze https://example.com
wqa batch urls.txt --parallel 10
wqa profiles  # List available profiles

๐Ÿ”ง Customization

Custom Metric Weights

let analyzer = Analyzer::builder()
    .with_profile_name("blog")?
    .with_metric_weight("word_count", 1.5)?       // Increase importance
    .with_metric_weight("readability_score", 2.0)? // Double weight
    .build()?;

Custom Penalties & Bonuses

let analyzer = Analyzer::builder()
    .with_profile_name("news")?
    // Penalty: -5 points if word count < 500
    .add_penalty_below("word_count", 500.0, 5.0)?
    // Bonus: +3 points if word count > 2000
    .add_bonus_above("word_count", 2000.0, 3.0)?
    .build()?;

Disable Metrics

let analyzer = Analyzer::builder()
    .with_profile_name("general")?
    .disable_metric("grammar_score")?     // Skip grammar analysis
    .disable_metric("language_detection")? // Skip language detection
    .build()?;

Output Customization

// Compact JSON (98.8% size reduction)
let compact = analyzer.run_compact(url, html).await?;

// Select specific fields
let minimal = analyzer.run_with_fields(
    url, 
    html,
    vec!["score", "verdict", "word_count"]
).await?;

๐Ÿ“ˆ Performance

  • Single page: < 1 second (typical)
  • Batch processing: 180+ pages/second with parallel analysis
  • Memory: ~50-100 MB per analyzer instance
  • Thread-safe: Each analyzer instance is thread-safe
// High-performance batch processing
use webpage_quality_analyzer::analyze_batch_high_performance;

let urls = vec![/* ... 100 URLs ... */];
let reports = analyze_batch_high_performance(urls, 10).await?; // 10 concurrent

๐Ÿ“š Documentation

๐Ÿงช Testing

cargo test                              # Run all tests
cargo test --features linkcheck         # With network features
cargo bench                             # Run benchmarks

๐Ÿ“„ License

Dual licensed under MIT OR Apache-2.0. You can choose either license.

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.

๐Ÿ“ฆ Related Packages

  • NPM: @webpage-quality-analyzer/core - JavaScript/TypeScript (WASM)
  • CLI: Download binaries for Linux/Windows/macOS
  • C++: Pre-compiled libraries with headers
  • Python: Coming soon (PyO3 bindings)

๐ŸŒŸ Why Choose This Analyzer?

  1. Comprehensive: 115 metrics covering all aspects of webpage quality
  2. Fast: Rust-powered performance, 180+ pages/sec batch processing
  3. Flexible: 9 profiles + full customization of weights, penalties, bonuses
  4. Multi-Platform: Works everywhere - Rust, WASM, C++, CLI
  5. Production-Ready: Extensive tests, documentation, real-world usage
  6. Modern: Async/await, latest Rust features, clean API design

๐Ÿ“Š Example Report

{
  "score": 87.5,
  "verdict": "Excellent",
  "url": "https://example.com",
  "metrics": {
    "content_metrics": {
      "word_count": 1250,
      "paragraph_count": 15,
      "avg_sentence_length": 18.5
    },
    "technical_metrics": {
      "title_length": 55,
      "has_meta_description": true,
      "image_count": 8
    },
    "seo_metrics": {
      "has_og_tags": true,
      "has_schema_org": true
    }
  }
}

Made with โค๏ธ in Rust | Version 1.0.0 | October 2025