Crawler

Struct Crawler 

Source
pub struct Crawler<A, C> { /* private fields */ }
Expand description

The core of this library. Create one with Crawler::new or Crawler::new_async to get started. Also see the examples.

Implementations§

Source§

impl Crawler<NonAsync, NoContext>

Source

pub fn new() -> Self

Create a new non-async, parallel Crawler without any context

let parallel_crawler=Crawler::new();
Source§

impl<C> Crawler<NonAsync, C>
where C: Send + Sync,

Source

pub fn start_dir<P: AsRef<Path>>(self, path: P) -> Self

Sets the directory the crawler should start in. Default is the current directory, resolved when Crawler::run is called, if that fails, it panics before doing anything.

 use std::collections::HashSet;
 use std::path::PathBuf;
 use std::sync::Mutex;

 //Assuming that the content of C:\foo isn't changing during execution
 //and this program is executed in that same folder
 let crawler_1_result =
 Crawler::new()
    .start_dir("C:\\foo")
    .context(Mutex::new(Vec::new()))
    .run(|ctx: Mutex<Vec<String>>, path| {
        ctx.lock().unwrap().insert(path.display());
    });
 let crawler_2_result =
 Crawler::new()
    .context(Mutex::new(Vec::new()))
    .run(|ctx: Mutex<HashSet<String>>, path| {
        ctx.lock().unwrap().insert(path.display());
    })?;

 //then (not guaranteed when using a Vec instead of a HashSet by design)
 assert_eq!(crawler_1_result, crawler_2_result);
Source

pub fn file_regex<STR: AsRef<str>>(self, regex: STR) -> Self

Only applies the closure in run to a file if the given regex matches


//prints all text files in the current directory (and all its subfolders)
Crawler::new()
    .file_regex(r"^.*\.txt$")
    .run(|_, path| {
        println!("{}", path.display());
    })?;
Source

pub fn folder_regex<STR: AsRef<str>>(self, regex: STR) -> Self

Only go into a folder if matches the given regex (meaning all files and subfolders etc. will not be traversed)

//given this folder structure:
//foo
// |--bar
// |   |--foo.txt
// |--foobar
// |    |---barbar
// |           |---baz.txt
// |--foo
//     |--baz.txt

//this prints *only* baz.txt because the regex matches "foo", but not "bar" or "barbar" AND "foobar"
Crawler::new()
    .start_dir("path\\to\\foo")
    .folder_regex("foo")
    .run(|_, path| {
        println!("{}", path.display());
    })?;
Source

pub fn search_depth(self, depth: u32) -> Self

How deep (in terms of folder layers over each other) the Crawler should go


//prints all text files in the current directory, but not its subfolders
Crawler::new()
    //exchanging the 0 with a 1 mean that it also traverses the subfolders, but not their subfolders
    .search_depth(0)
    .file_regex(r"^.*\.txt$")
    .run(|_, path| {
        println!("{}", path.display());
    })?;
Source

pub fn context<CNEW: Send + Sync>( self, context: CNEW, ) -> Crawler<NonAsync, CNEW>

Adds a context ( = a value that is passed to the closure on every invocation via an Arc) with the type CNEW. It is returned from the run function after execution. Defaults to the zero-sized NoContext.

 use std::sync::atomic::AtomicU16;

 //bind the context to a variable
 let result =
 Crawler::new()
    //adds a counter (for everything not representable with Atomics, a Mutex is recommended)
     .context(AtomicU16::new(0))
     .run(|_, path| {
         println!("{}", path.display());
     })?;
 println!("{} files in the current directory")
Source

pub fn run<A, E>(self, action: A) -> Result<C, Box<dyn Error + Send + 'static>>
where A: FnMut(Arc<C>, PathBuf) -> Result<(), E> + Clone + Send + Sync, E: Error + Send + 'static,

Runs the (modified) Crawler returned from Crawler::new, execution a closure that’s passed a Context and the path of the file for every file in the specified directory. For exceptions, see search_depth, file_regex and folder_regex.

 use file_crawler::prelude::*;

 use std::path::PathBuf;

 Crawler::new()
    .start_dir("C\\user\\foo")
    .file_regex(r"^.*\.txt$")
    .search_depth(3)
    .run(|_, path| {
        println!("{}", path.display());
        Ok(())
    })?;
Source§

impl Crawler<Async, NoContext>

Source

pub fn new_async() -> Self

Create a new async (parallel)[^async_disclaimer] Crawler without any context

let async_crawler=Crawler::new_async();
Source§

impl<C> Crawler<Async, C>
where C: Send + Sync + 'static,

Source

pub fn start_dir<P: AsRef<Path>>(self, path: P) -> Self

Source

pub fn file_regex<STR: AsRef<str>>(self, regex: STR) -> Self

Source

pub fn folder_regex<STR: AsRef<str>>(self, regex: STR) -> Self

Source

pub fn search_depth(self, depth: u32) -> Self

Source

pub fn context<CNEW: Send + Sync + 'static>( self, context: CNEW, ) -> Crawler<Async, CNEW>

Source

pub async fn run<Fun, Fut, E>( self, action: Fun, ) -> Result<C, Box<dyn Error + Send + 'static>>
where E: Send + Error + 'static, Fun: Fn(Arc<C>, PathBuf) -> Fut + Send + 'static + Clone, Fut: Future<Output = Result<(), E>> + Send + 'static,

Runs a (modified) asynchronous file crawler from Crawler::new_async using tokio. Requires an at least two-threaded runtime (3). Otherwise, the same as the synchronous version. It is recommended to use the exposed tokio (through the prelude) dependency instead of std when possible.

 use file_crawler::prelude::*;

 use std::path::PathBuf;

 Crawler::new()
    .start_dir("C\\user\\foo")
    .file_regex(r"^.*\.txt$")
    .run(|_, path| {
        let contents=String::new();
        let file=tokio::fs::File::open(&path).await?;
        file.read_to_string(&mut contents).await?;
        println!("{}:\n{}", path.display(), contents);
        Ok(())
    })?;

Trait Implementations§

Source§

impl<A: Clone, C: Clone> Clone for Crawler<A, C>

Source§

fn clone(&self) -> Crawler<A, C>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<A: Debug, C: Debug> Debug for Crawler<A, C>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl<A: Default, C: Default> Default for Crawler<A, C>

Source§

fn default() -> Crawler<A, C>

Returns the “default value” for a type. Read more

Auto Trait Implementations§

§

impl<A, C> Freeze for Crawler<A, C>
where C: Freeze,

§

impl<A, C> RefUnwindSafe for Crawler<A, C>

§

impl<A, C> Send for Crawler<A, C>
where C: Send, A: Send,

§

impl<A, C> Sync for Crawler<A, C>
where C: Sync, A: Sync,

§

impl<A, C> Unpin for Crawler<A, C>
where C: Unpin, A: Unpin,

§

impl<A, C> UnwindSafe for Crawler<A, C>
where C: UnwindSafe, A: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.