webpage_quality_analyzer 1.0.2

# 🔍 Webpage Quality Analyzer

> **High-performance webpage quality analysis with 115 comprehensive metrics across 8 built-in profiles**
> 
> **Latest: v1.0.2** - Profile scoring improvements, stricter login page validation, enhanced baseline logic

[![Crates.io](https://img.shields.io/crates/v/webpage_quality_analyzer)](https://crates.io/crates/webpage_quality_analyzer)
[![docs.rs](https://docs.rs/webpage_quality_analyzer/badge.svg)](https://docs.rs/webpage_quality_analyzer)
[![npm](https://img.shields.io/npm/v/@webpage-quality-analyzer/core)](https://www.npmjs.com/package/@webpage-quality-analyzer/core)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue)](LICENSE)
[![Build Status](https://img.shields.io/github/actions/workflow/status/NotGyashu/webpage-quality-analyser/ci.yml?branch=main)](https://github.com/NotGyashu/webpage-quality-analyser/actions)

A blazing-fast, multi-platform library for analyzing webpage quality with **115 metrics** (92 HTML-based + 23 network-based) organized across **7 categories**, with **8 professionally-tuned profiles** for different page types.

> **⚠️ Breaking Change in v1.0.2**: Product profile reverted to e-commerce focus. Software/SaaS pages should use "general" or "homepage" profiles instead. See [CHANGELOG.md](CHANGELOG.md#breaking-changes) for migration guide.

---

## ✨ Features

- **🚀 High Performance**: Analyze typical webpages in <100ms, large pages in <1s
- **📊 115 Comprehensive Metrics**: Content, SEO, Performance, Accessibility, and more
- **🎯 8 Built-in Profiles**: Optimized for news, blogs, products, portfolios, etc.
- **🌍 Multi-Platform**: Rust, WebAssembly (Browser/Node.js), C++, CLI tool
- **⚡ Parallel Batch Processing**: 180+ pages/second with concurrent analysis
- **🎨 Customizable Scoring**: Adjust weights, thresholds, penalties, bonuses
- **📱 Mobile-Friendly**: Responsive design and mobile usability metrics
- **♿ Accessibility**: WCAG 2.1 AA/AAA compliance checking
- **🔒 Security**: HTTPS, CSP, HSTS, XSS protection validation
- **📈 Real-time Analysis**: No external API calls, runs entirely locally

---

## 🚀 Quick Start

### Rust

```toml
[dependencies]
webpage_quality_analyzer = "1.0.0"
```

```rust
use webpage_quality_analyzer::analyze;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let report = analyze("https://example.com", None).await?;
    println!("Score: {}/100, Quality: {}", report.score, report.verdict);
    Ok(())
}
```

### JavaScript/TypeScript (WASM)

```bash
npm install @webpage-quality-analyzer/core
```

```javascript
import init, { WasmAnalyzer } from '@webpage-quality-analyzer/core';

await init();
const analyzer = new WasmAnalyzer();
const report = await analyzer.analyze('<html>...</html>');
console.log(`Score: ${report.score}/100`);
```

### C++

```bash
wget https://github.com/NotGyashu/webpage-quality-analyser/releases/download/v1.0.0/cpp-package-v1.0.0-linux-x64.tar.gz
tar -xzf cpp-package-v1.0.0-linux-x64.tar.gz
```

```cpp
#include "webpage_quality_analyzer/webpage_quality_analyzer.hpp"

int main() {
    auto report = wqa::Analyzer::analyze("https://example.com");
    std::cout << "Score: " << report.score() << "/100" << std::endl;
    return 0;
}
```

### CLI Tool

```bash
# Linux/macOS
wget https://github.com/NotGyashu/webpage-quality-analyser/releases/download/v1.0.0/wqa-cli-v1.0.0-linux-x64.tar.gz
tar -xzf wqa-cli-v1.0.0-linux-x64.tar.gz
sudo mv wqa /usr/local/bin/

# Analyze a webpage
wqa analyze --url https://example.com --profile news --output report.json
```

---

## 📚 Documentation

| Resource | Description |
|----------|-------------|
| **[📖 Documentation Index](DOCUMENTATION_INDEX.md)** | Complete documentation hub |
| **[🚀 Installation Guide](docs/getting-started/INSTALLATION.md)** | Platform-specific setup |
| **[🎯 First Analysis Tutorial](docs/getting-started/FIRST_ANALYSIS.md)** | 5-minute quick start |
| **[📊 Understanding Metrics](docs/getting-started/UNDERSTANDING_METRICS.md)** | All 115 metrics explained |
| **[🏗️ Build & Release Guide](docs/guides/BUILD_AND_RELEASE_GUIDE.md)** | Complete build workflows |
| **[🔧 API Reference](docs/api-reference/)** | Complete API documentation |
| **[💡 Examples](examples/)** | 40+ working code examples |

---

## 🎯 Platform Support Matrix

| Platform | Metrics | Async | Batch | Status |
|----------|---------|-------|-------|--------|
| **Rust Library** | All 115 | ✅ Tokio | ✅ Yes | 🟢 Production |
| **WASM/Browser** | 92 HTML | ✅ Promise | ✅ Yes | 🟢 Production |
| **C++ FFI** | All 115 | ❌ Blocking | ✅ Yes | 🟢 Production |
| **CLI Tool** | All 115 | ✅ Tokio | ✅ Yes | 🟢 Production |
| **Python** | All 115 | 🚧 Planned | 🚧 Planned | 🟡 Roadmap |

**Note**: WASM provides 92 HTML-based metrics (network metrics require server-side fetching).

---

## 📊 Metrics Overview

### 115 Total Metrics

```
HTML-Based Metrics (92)          Network-Based Metrics (23)
├── Content (11)                 ├── Performance (11)
│   ├── Word count              │   ├── Largest Contentful Paint
│   ├── Readability             │   ├── First Contentful Paint
│   ├── Sentence count          │   ├── Time to Interactive
│   └── ...                     │   └── ...
├── Structure (5)                ├── Security (6)
├── Media (8)                    │   ├── HTTPS enabled
├── SEO (9)                      │   ├── CSP header
├── Links (8)                    │   └── ...
├── Technical (6)                ├── Analytics (3)
├── Accessibility (7)            └── Error Handling (3)
├── Mobile (4)
├── Authority (3)
├── Forms (6)
├── Structured Data (4)
├── Branding (4)
├── User Experience (5)
├── Business (3)
└── Internationalization (2)
```

**See detailed breakdown**: [Metrics Reference →](docs/getting-started/UNDERSTANDING_METRICS.md)

---

## 🎨 Built-in Profiles

Choose the right profile for your page type to get optimized scoring:

| Profile | Best For | Content Weight | SEO Weight | Key Metrics |
|---------|----------|----------------|------------|-------------|
| **news** | News articles | 35% | 25% | Freshness, metadata, social sharing |
| **blog** | Blog posts | 30% | 25% | Readability, structure, engagement |
| **product** | E-commerce | 20% | 30% | Images, structured data, conversion |
| **portfolio** | Personal sites | 25% | 20% | Visual design, branding, projects |
| **content_article** | Long-form content | 40% | 20% | Word count, depth, citations |
| **login_page** | Authentication | 10% | 5% | Security, forms, usability |
| **homepage** | Site homepages | 20% | 30% | Navigation, branding, performance |
| **general** | Default | 30% | 25% | Balanced across all categories |

**Learn more**: [Choosing Profiles →](docs/getting-started/CHOOSING_PROFILES.md)

---

## 🌟 Key Features

### 3-Tier API Design

**Level 1: Simple** - One function call
```rust
let report = analyze("https://example.com", None).await?;
```

**Level 2: Builder** - Custom configuration
```rust
let analyzer = Analyzer::<DefaultRuntime>::builder()
    .with_profile_name("news")?
    .enable_linkcheck(true)
    .build()?;
```

**Level 3: Config Files** - YAML/JSON/TOML
```rust
let analyzer = from_config_file("config.yaml")?;
```

### Advanced Customization

**Adjust Metric Weights**:
```rust
analyzer.with_metric_weight("word_count", 1.5)?
        .with_metric_weight("readability_score", 2.0)?
```

**Custom Penalties & Bonuses**:
```rust
analyzer.add_penalty_below("word_count", 300.0, 10.0)?
        .add_bonus_above("readability_score", 80.0, 5.0)?
```

**Output Filtering** (98.8% size reduction):
```rust
let compact_report = analyzer.run_compact(url, html).await?;
```

### Batch Processing

**Parallel Analysis** (up to 180+ pages/second):
```rust
let reports = analyze_batch_urls_parallel(urls, None).await?;
```

**High-Performance Mode**:
```rust
let reports = analyze_batch_high_performance(urls).await?;
```

---

## 🔧 Use Cases

- **🔍 SEO Auditing**: Analyze title, meta tags, headings, structured data
- **📝 Content Quality**: Measure readability, word count, structure
- **♿ Accessibility**: Check WCAG compliance, ARIA labels, contrast
- **⚡ Performance**: Track Core Web Vitals, page size, load times
- **🤖 CI/CD Integration**: Automated quality checks in build pipelines
- **📊 Competitive Analysis**: Compare your pages against competitors
- **🚨 Monitoring**: Track quality metrics over time
- **🎯 A/B Testing**: Measure quality impact of design changes

---

## 📈 Performance

Benchmarked on typical webpages:

| Page Size | Metrics | Time | Memory | Throughput |
|-----------|---------|------|--------|------------|
| Small (<10KB) | 92 HTML | <100ms | <10MB | 200+ pages/s |
| Medium (50KB) | 115 Full | <500ms | <15MB | 150+ pages/s |
| Large (100KB+) | 115 Full | <1000ms | <20MB | 100+ pages/s |

**Optimizations**:
- ✅ DOM caching (elements cached, reused 115 times)
- ✅ Connection pooling (persistent HTTP connections)
- ✅ Parallel batch processing (Arc<Semaphore>, max 10 concurrent)
- ✅ Zero-copy metric scorers (Arc<dyn MetricScorer>)
- ✅ Optimized JSON serialization (field selectors)

---

## 🛠️ Development

### Building from Source

```bash
# Clone repository
git clone https://github.com/NotGyashu/webpage-quality-analyser.git
cd webpage-quality-analyser

# Build Rust library
cargo build --release

# Build WASM
wasm-pack build --target bundler --no-default-features --features wasm

# Build C++ bindings
cargo build --release --features ffi
./scripts/build_ffi.sh

# Build CLI tool
cargo build --release --bin wqa --features cli

# Run tests
cargo test --all-features

# Run benchmarks
cargo bench
```

### Testing

```bash
# All tests (40+ test files)
cargo test

# Specific test suites
cargo test comprehensive_metrics   # 115-metric validation
cargo test phase3                  # Profile-aware scoring
cargo test output_customization    # Field selectors
cargo test weight_customization    # Metric weight adjustment

# WASM tests
wasm-pack test --headless --firefox

# C++ examples
cd build && ./bindings/cpp/examples/level1_simple
```

**See**: [Development Guide →](docs/development/setup.md)

---

## 🤝 Contributing

We welcome contributions! See [CONTRIBUTING.md](docs/contributing.md) for guidelines.

### Areas for Contribution

- 🐛 Bug fixes and issue reporting
- 📚 Documentation improvements
- 🌍 New language bindings (Python, Go, etc.)
- 📊 Additional metrics and profiles
- ⚡ Performance optimizations
- 🧪 Test coverage expansion

---

## 📝 License

Licensed under either of:

- **Apache License, Version 2.0** ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
- **MIT License** ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)

at your option.

---

## 🔗 Links

- **Documentation**: [docs.rs/webpage_quality_analyzer](https://docs.rs/webpage_quality_analyzer)
- **Crates.io**: [crates.io/crates/webpage_quality_analyzer](https://crates.io/crates/webpage_quality_analyzer)
- **npm**: [npmjs.com/package/@webpage-quality-analyzer/core](https://www.npmjs.com/package/@webpage-quality-analyzer/core)
- **GitHub**: [github.com/NotGyashu/webpage-quality-analyser](https://github.com/NotGyashu/webpage-quality-analyser)
- **Issues**: [github.com/NotGyashu/webpage-quality-analyser/issues](https://github.com/NotGyashu/webpage-quality-analyser/issues)
- **Discussions**: [github.com/NotGyashu/webpage-quality-analyser/discussions](https://github.com/NotGyashu/webpage-quality-analyser/discussions)

---

## 🙏 Acknowledgments

Built with:
- [Rust](https://www.rust-lang.org/) - Systems programming language
- [Tokio](https://tokio.rs/) - Async runtime
- [wasm-bindgen](https://rustwasm.github.io/wasm-bindgen/) - WASM bindings
- [tl](https://crates.io/crates/tl) - HTML parsing
- [readability](https://crates.io/crates/readability) - Content extraction

Special thanks to the Rust community and all contributors!

---

## 📊 Project Stats

- **Lines of Code**: ~25,000+ (Rust)
- **Test Coverage**: 40+ test files, 279+ tests
- **Benchmarks**: 15+ performance benchmarks
- **Documentation**: 50+ markdown files
- **Examples**: 40+ working examples
- **Supported Platforms**: 4 (Rust, WASM, C++, CLI)

---

## 🎯 Roadmap

### v1.1.0 (Q1 2026)
- [ ] Python bindings (PyO3)
- [ ] Enhanced NLP features
- [ ] Real-time browser extension
- [ ] Cloud API service

### v1.2.0 (Q2 2026)
- [ ] Machine learning-based scoring
- [ ] Historical trend analysis
- [ ] Competitive benchmarking database
- [ ] Advanced visualization tools

**See complete roadmap**: [docs/ROADMAP.md](docs/ROADMAP.md)

---

## ⭐ Star History

If you find this project useful, please consider giving it a star! ⭐

---

**Made with ❤️ by [@NotGyashu](https://github.com/NotGyashu) and [contributors](https://github.com/NotGyashu/webpage-quality-analyser/graphs/contributors)**

**Last Updated**: October 9, 2025 | **Version**: 1.0.0 | **Status**: Production Ready

---

**Navigation**: [Documentation Index →](DOCUMENTATION_INDEX.md) | [Installation Guide →](docs/getting-started/INSTALLATION.md) | [Quick Start Tutorial →](docs/getting-started/FIRST_ANALYSIS.md)