Skip to main content

Crate content_extractor_rl

Crate content_extractor_rl 

Source
Expand description

Content Extractor RL - RL-based HTML article extraction library

This library provides functionality for extracting article content from HTML using reinforcement learning with fallback to heuristic-based extraction.

Re-exports§

pub use evaluation::GroundTruthData;
pub use evaluation::GroundTruthEvaluator;
pub use evaluation::EvaluationMetrics;
pub use evaluation::algorithm_comparison::AlgorithmComparator;
pub use evaluation::algorithm_comparison::ComparisonReport;
pub use models::ModelMetadata;
pub use agents::AgentFactory;
pub use agents::AlgorithmType;
pub use agents::RLAgent;
pub use checkpoint::Checkpoint;
pub use checkpoint::CheckpointManager;
pub use config::Config;
pub use site_profile::SiteProfile;
pub use site_profile::SiteProfileMemory;
pub use baseline_extractor::BaselineExtractor;
pub use environment::ArticleExtractionEnvironment;
pub use training::train_standard;
pub use training::train_with_improvements;
pub use training::TrainingMetrics;
pub use hyperparameter_tuner::TPEOptimizer;
pub use hyperparameter_tuner::Hyperparameters;
pub use hyperparameter_tuner::HyperparameterSpace;
pub use hyperparameter_tuner::TrialResult;
pub use plotting::TrainingPlotter;
pub use plotting::PlotConfig;
pub use device::get_device;
pub use device::cuda_is_available;
pub use device::get_device_info;
pub use device::print_device_info;
pub use cli_utils::*;

Modules§

agents
baseline_extractor
checkpoint
Model checkpoint management
cli_utils
High-level command interface for CLI This module contains the main logic for each CLI command
config
curriculum
Curriculum learning manager
device
Device selection for CPU/CUDA
environment
evaluation
html_parser
HTML parsing and DOM manipulation utilities
hyperparameter_tuner
Hyperparameter tuning using TPE (Tree-structured Parzen Estimator) with resume capability
models
plotting
Training visualization and plotting using plotters library
replay_buffer
reward
site_profile
text_utils
training

Structs§

BatchExtractionResult
Batch extraction result
ExtractedArticle
Extracted article result

Enums§

ExtractionError
Errors that can occur during article extraction

Type Aliases§

Result
Result type for article extraction operations