Skip to main content

DiscoveryConfig

Struct DiscoveryConfig 

Source
pub struct DiscoveryConfig {
    pub mode: DiscoveryMode,
    pub discover_sitemaps: bool,
    pub max_sitemap_depth: usize,
    pub extract_page_metadata: bool,
    pub link_extract_options: LinkExtractOptions,
    pub rules: Vec<DiscoveryRule>,
}
Expand description

Core runtime types and traits used to define and run a crawl. Discovery-specific runtime configuration.

Fields§

§mode: DiscoveryMode

How the runtime should discover follow-up work from responses.

§discover_sitemaps: bool

Whether sitemap XML should be parsed into follow-up requests.

§max_sitemap_depth: usize

Maximum recursion depth for nested sitemap indexes.

§extract_page_metadata: bool

Whether page metadata should be extracted and attached to response metadata.

§link_extract_options: LinkExtractOptions

Base link extraction options used for HTML discovery.

§rules: Vec<DiscoveryRule>

Optional rule-like link discovery behavior matched against source responses.

Implementations§

Source§

impl DiscoveryConfig

Source

pub fn new() -> DiscoveryConfig

Creates a new discovery config with default values.

Source

pub fn with_mode(self, mode: DiscoveryMode) -> DiscoveryConfig

Sets the discovery mode.

Source

pub fn with_sitemaps(self, enabled: bool) -> DiscoveryConfig

Enables or disables sitemap parsing.

Source

pub fn with_max_sitemap_depth(self, depth: usize) -> DiscoveryConfig

Sets the maximum nested sitemap depth.

Source

pub fn with_page_metadata(self, enabled: bool) -> DiscoveryConfig

Enables or disables page metadata extraction.

Replaces the base link extraction options.

Source

pub fn with_rules( self, rules: impl IntoIterator<Item = DiscoveryRule>, ) -> DiscoveryConfig

Replaces the configured discovery rules.

Source

pub fn with_rule(self, rule: DiscoveryRule) -> DiscoveryConfig

Adds a single discovery rule.

Source

pub fn with_same_site_only(self, enabled: bool) -> DiscoveryConfig

Sets whether only same-site links should be discovered.

Sets whether text content should be scanned for plain-text URLs.

Source

pub fn with_allow_patterns( self, patterns: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts discovery to URLs that match at least one glob-style pattern.

Source

pub fn with_deny_patterns( self, patterns: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Excludes URLs that match any glob-style pattern.

Source

pub fn with_allow_domains( self, domains: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts discovery to the given domains or subdomains.

Source

pub fn with_deny_domains( self, domains: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Excludes discovery for the given domains or subdomains.

Source

pub fn with_allow_path_prefixes( self, prefixes: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts discovery to URL paths with one of the provided prefixes.

Source

pub fn with_deny_path_prefixes( self, prefixes: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Excludes URL paths with one of the provided prefixes.

Source

pub fn with_allowed_tags( self, tags: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts attribute extraction to specific HTML tags.

Source

pub fn with_allowed_attributes( self, attributes: impl IntoIterator<Item = impl Into<String>>, ) -> DiscoveryConfig

Restricts attribute extraction to specific attributes.

Restricts discovery to the provided link types.

Excludes the provided link types from discovery.

Returns the effective link extraction options for the configured mode.

Returns the effective link extraction options for a specific rule or override.

Source

pub fn should_extract_metadata(&self) -> bool

Returns true when metadata extraction should run.

Trait Implementations§

Source§

impl Clone for DiscoveryConfig

Source§

fn clone(&self) -> DiscoveryConfig

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for DiscoveryConfig

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

Formats the value using the given formatter. Read more
Source§

impl Default for DiscoveryConfig

Source§

fn default() -> DiscoveryConfig

Returns the “default value” for a type. Read more
Source§

impl PartialEq for DiscoveryConfig

Source§

fn eq(&self, other: &DiscoveryConfig) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for DiscoveryConfig

Source§

impl StructuralPartialEq for DiscoveryConfig

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

Source§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more