Expand description
Content Extractor RL - RL-based HTML article extraction library
This library provides functionality for extracting article content from HTML using reinforcement learning with fallback to heuristic-based extraction.
Re-exports§
pub use evaluation::GroundTruthData;pub use evaluation::GroundTruthEvaluator;pub use evaluation::EvaluationMetrics;pub use evaluation::algorithm_comparison::AlgorithmComparator;pub use evaluation::algorithm_comparison::ComparisonReport;pub use models::ModelMetadata;pub use agents::AgentFactory;pub use agents::AlgorithmType;pub use agents::RLAgent;pub use checkpoint::Checkpoint;pub use checkpoint::CheckpointManager;pub use config::Config;pub use site_profile::SiteProfile;pub use site_profile::SiteProfileMemory;pub use baseline_extractor::BaselineExtractor;pub use environment::ArticleExtractionEnvironment;pub use training::train_standard;pub use training::train_with_improvements;pub use training::TrainingMetrics;pub use hyperparameter_tuner::TPEOptimizer;pub use hyperparameter_tuner::Hyperparameters;pub use hyperparameter_tuner::HyperparameterSpace;pub use hyperparameter_tuner::TrialResult;pub use plotting::TrainingPlotter;pub use plotting::PlotConfig;pub use device::get_device;pub use device::cuda_is_available;pub use device::get_device_info;pub use device::print_device_info;pub use cli_utils::*;
Modules§
- agents
- baseline_
extractor - checkpoint
- Model checkpoint management
- cli_
utils - High-level command interface for CLI This module contains the main logic for each CLI command
- config
- curriculum
- Curriculum learning manager
- device
- Device selection for CPU/CUDA
- environment
- evaluation
- html_
parser - HTML parsing and DOM manipulation utilities
- hyperparameter_
tuner - Hyperparameter tuning using TPE (Tree-structured Parzen Estimator) with resume capability
- models
- plotting
- Training visualization and plotting using plotters library
- replay_
buffer - reward
- site_
profile - text_
utils - training
Structs§
- Batch
Extraction Result - Batch extraction result
- Extracted
Article - Extracted article result
Enums§
- Extraction
Error - Errors that can occur during article extraction
Type Aliases§
- Result
- Result type for article extraction operations