Crate webpage_quality_analyzer

Crate webpage_quality_analyzer 

Source
Expand description

§Webpage Quality Analyzer

Version: 1.0.2

A high-performance Rust crate for analyzing webpage quality with 3 simple usage levels:

What’s New in 1.0.2:

  • Fixed MetricEquals penalty trigger implementation
  • Product profile reverted to e-commerce focus (breaking change - use “general” for software)
  • Login page validation now stricter (Forms category weight: 40%)
  • Homepage and General profiles more balanced with new bonuses
  • Enhanced minimum baseline scoring with better documentation

§Quick Start (Level 1 - Simple)

use webpage_quality_analyzer::{analyze, analyze_with_profile};

// Analyze by fetching URL
let report = analyze("https://example.com", None).await?;

// Analyze provided HTML
let html = "<html><head><title>Test</title></head><body><p>Content</p></body></html>";
let report = analyze("https://example.com", Some(html)).await?;

// Use specific profile
let report = analyze_with_profile("https://example.com", None, "news").await?;

§Custom Configuration (Level 2 - Builder)

use webpage_quality_analyzer::{Analyzer, async_runtime::DefaultRuntime};

let analyzer = Analyzer::<DefaultRuntime>::builder()
    .with_profile_name("content_article")?
    .enable_linkcheck(true)
    .enable_nlp(true)
    .build()?;

let report = analyzer.run("https://example.com", None).await?;

§Advanced Setup (Level 3 - Config File)

use webpage_quality_analyzer::from_config_file;

let analyzer = from_config_file("my-config.yaml")?;
// Use analyzer.run() for analysis

Re-exports§

pub use models::models::AnalysisMode;
pub use models::models::AnalyzeError;
pub use models::models::AppliedBonusInfo;
pub use models::models::AppliedPenaltyInfo;
pub use models::models::ContentChunk;
pub use models::models::ContentChunkType;
pub use models::models::ContentComplianceInfo;
pub use models::models::ExtractedMetadata;
pub use models::models::FormInfo;
pub use models::models::Heading;
pub use models::models::ImageInfo;
pub use models::models::LinkInfo;
pub use models::models::MediaInfo;
pub use models::models::PageMetrics;
pub use models::models::PageMetricsFromHTML;
pub use models::models::PageMetricsFullFetch;
pub use models::models::PageQualityReport;
pub use models::models::Phase3ScoringDetails;
pub use models::models::ProcessedDocument;
pub use models::models::QualityBand;
pub use models::models::Result;
pub use models::models::StructuredData;
pub use content_extraction::create_content_extractor;
pub use content_extraction::create_heuristic_extractor;
pub use content_extraction::create_readability_extractor;
pub use content_extraction::ContentExtractor;
pub use content_extraction::ContentScore;
pub use content_extraction::ExtractionStrategy;
pub use content_extraction::HeuristicExtractor;
pub use content_extraction::MultiStrategyExtractor;
pub use content_extraction::ReadabilityExtractor;
pub use extractor::create_extractor;
pub use extractor::DefaultExtractor;
pub use extractor::Extractor;
pub use parser::parse_html;
pub use parser::HtmlParser;
pub use parser::Parser;
pub use config::config_manager::ConfigFormat;
pub use config::config_manager::ConfigManager;
pub use config::config_models::PluginConfig;
pub use config::config_models::ProfileConfig;
pub use config::config_models::Verbosity;
pub use config::enhanced_models::EnhancedScoringProfile;
pub use config::profile_modifier::ProfileModifier;
pub use config::templates::ProfileTemplates;
pub use content::create_basic_content_processor;
pub use content::create_content_processor;
pub use content::ContentProcessor;
pub use content::DefaultContentProcessor;
pub use fetcher::FetchOptions;
pub use fetcher::RetryConfig;
pub use fetcher::WebFetcher;
pub use scoring::ContentValidator;
pub use scoring::Phase3ScoringSystem;
pub use scoring::ProfileAwareScorer;
pub use scoring::ProfileCompiler;
pub use utils::json_optimizer::FieldSelector;
pub use utils::json_optimizer::FieldSelectorBuilder;
pub use utils::json_optimizer::OptimizedSerializer;
pub use utils::json_optimizer::SerializationOptions;

Modules§

analysis
Analysis module for webpage quality assessment
async_runtime
Async runtime abstraction layer
config
constants
String constants used throughout the application Centralizing these avoids repeated allocations
content
content_extraction
Positive Content Selection System
extractor
fetcher
Web fetching module for URL-based analysis
metrics
models
parser
scoring
test_fixtures
Centralized test fixtures and HTML constants
utils

Structs§

Analyzer
AnalyzerBuilder
Builder for configuring the analyzer

Constants§

VERSION
Library version

Functions§

analyze
Level 1 - Simple Usage: Primary entry point for webpage quality analysis
analyze_batch_high_performance
High-Performance Batch Processing: Analyze multiple URLs concurrently
analyze_with_profile
Level 1 - Simple Usage: Profile-specific analysis for different content types
from_config_file
Level 3 - Advanced Setup: Create analyzer from configuration file