TextPreprocessingStep

Struct TextPreprocessingStep 

Source
pub struct TextPreprocessingStep { /* private fields */ }
Expand description

Built-in text preprocessing step for content normalization

Applies a sequence of text transformations to clean and normalize content before further processing. Supports common operations like whitespace normalization, case conversion, and special character handling.

§Supported Operations

  • Whitespace Normalization: Collapse multiple spaces into single spaces
  • Case Conversion: Convert text to lowercase for consistency
  • Special Character Removal: Remove non-alphanumeric characters
  • Regex Replacement: Custom pattern-based text replacement

§Example

use rrag::prelude::*;

let step = TextPreprocessingStep::new(vec![
    TextOperation::NormalizeWhitespace,
    TextOperation::RemoveSpecialChars,
    TextOperation::ToLowercase,
]);

// Can also be built fluently
let step = TextPreprocessingStep::new(vec![])
    .with_operation(TextOperation::NormalizeWhitespace)
    .with_operation(TextOperation::RegexReplace {
        pattern: r"\d+".to_string(),
        replacement: "[NUMBER]".to_string(),
    });

§Performance

  • Operations are applied in sequence for predictable results
  • String allocations are minimized where possible
  • Regex operations are compiled once and reused
  • Supports batch processing for multiple documents

Implementations§

Source§

impl TextPreprocessingStep

Source

pub fn new(operations: Vec<TextOperation>) -> Self

Create a new text preprocessing step with specified operations

Trait Implementations§

Source§

impl PipelineStep for TextPreprocessingStep

Source§

fn name(&self) -> &str

Step name/identifier
Source§

fn description(&self) -> &str

Step description
Source§

fn input_types(&self) -> Vec<&'static str>

Input data types this step accepts
Source§

fn output_type(&self) -> &'static str

Output data type this step produces
Source§

fn execute<'life0, 'async_trait>( &'life0 self, context: PipelineContext, ) -> Pin<Box<dyn Future<Output = RragResult<PipelineContext>> + Send + 'async_trait>>
where Self: 'async_trait, 'life0: 'async_trait,

Execute the step
Source§

fn validate_input(&self, _data: &PipelineData) -> RragResult<()>

Validate input data
Source§

fn is_parallelizable(&self) -> bool

Whether this step can run in parallel with others
Source§

fn dependencies(&self) -> Vec<&str>

Dependencies on other steps (step names)

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> ErasedDestructor for T
where T: 'static,