pub struct HybridExtractor { /* private fields */ }Expand description
End-to-end hybrid extractor: the classifier (supervised) picks the content
node, then the RL-tuned ExtractionParams drive block-level extraction
within it. When no trained classifier is supplied it falls back to the
Readability-style NodeFeatures::heuristic_content_score, so it is useful
even before any training has happened.
Implementations§
Source§impl HybridExtractor
impl HybridExtractor
Sourcepub fn heuristic(stopwords: HashSet<String>) -> Self
pub fn heuristic(stopwords: HashSet<String>) -> Self
Heuristic-only extractor (no learned model).
Sourcepub fn with_classifier(
classifier: NodeClassifier,
stopwords: HashSet<String>,
) -> Self
pub fn with_classifier( classifier: NodeClassifier, stopwords: HashSet<String>, ) -> Self
Extractor backed by a trained node classifier.
Sourcepub fn extract(
&self,
html: &str,
num_candidates: usize,
params: &ExtractionParams,
) -> Result<Option<HybridExtraction>>
pub fn extract( &self, html: &str, num_candidates: usize, params: &ExtractionParams, ) -> Result<Option<HybridExtraction>>
Extract article content from a page. Returns None when the document
exposes no candidate nodes.
Auto Trait Implementations§
impl !RefUnwindSafe for HybridExtractor
impl !UnwindSafe for HybridExtractor
impl Freeze for HybridExtractor
impl Send for HybridExtractor
impl Sync for HybridExtractor
impl Unpin for HybridExtractor
impl UnsafeUnpin for HybridExtractor
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more
impl<ST, DT> CastableFrom<ST, Initialized, Initialized> for DT
impl<ST, DT> CastableFrom<ST, Uninit, Uninit> for DT
impl<T> ErasedDestructor for Twhere
T: 'static,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
Converts
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more