pub struct TextPreprocessingStep { /* private fields */ }Expand description
Built-in text preprocessing step for content normalization
Applies a sequence of text transformations to clean and normalize content before further processing. Supports common operations like whitespace normalization, case conversion, and special character handling.
§Supported Operations
- Whitespace Normalization: Collapse multiple spaces into single spaces
- Case Conversion: Convert text to lowercase for consistency
- Special Character Removal: Remove non-alphanumeric characters
- Regex Replacement: Custom pattern-based text replacement
§Example
use rrag::prelude::*;
let step = TextPreprocessingStep::new(vec![
TextOperation::NormalizeWhitespace,
TextOperation::RemoveSpecialChars,
TextOperation::ToLowercase,
]);
// Can also be built fluently
let step = TextPreprocessingStep::new(vec![])
.with_operation(TextOperation::NormalizeWhitespace)
.with_operation(TextOperation::RegexReplace {
pattern: r"\d+".to_string(),
replacement: "[NUMBER]".to_string(),
});§Performance
- Operations are applied in sequence for predictable results
- String allocations are minimized where possible
- Regex operations are compiled once and reused
- Supports batch processing for multiple documents
Implementations§
Source§impl TextPreprocessingStep
impl TextPreprocessingStep
Sourcepub fn new(operations: Vec<TextOperation>) -> Self
pub fn new(operations: Vec<TextOperation>) -> Self
Create a new text preprocessing step with specified operations
Trait Implementations§
Source§impl PipelineStep for TextPreprocessingStep
impl PipelineStep for TextPreprocessingStep
Source§fn description(&self) -> &str
fn description(&self) -> &str
Step description
Source§fn input_types(&self) -> Vec<&'static str>
fn input_types(&self) -> Vec<&'static str>
Input data types this step accepts
Source§fn output_type(&self) -> &'static str
fn output_type(&self) -> &'static str
Output data type this step produces
Source§fn execute<'life0, 'async_trait>(
&'life0 self,
context: PipelineContext,
) -> Pin<Box<dyn Future<Output = RragResult<PipelineContext>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
fn execute<'life0, 'async_trait>(
&'life0 self,
context: PipelineContext,
) -> Pin<Box<dyn Future<Output = RragResult<PipelineContext>> + Send + 'async_trait>>where
Self: 'async_trait,
'life0: 'async_trait,
Execute the step
Source§fn validate_input(&self, _data: &PipelineData) -> RragResult<()>
fn validate_input(&self, _data: &PipelineData) -> RragResult<()>
Validate input data
Source§fn is_parallelizable(&self) -> bool
fn is_parallelizable(&self) -> bool
Whether this step can run in parallel with others
Source§fn dependencies(&self) -> Vec<&str>
fn dependencies(&self) -> Vec<&str>
Dependencies on other steps (step names)
Auto Trait Implementations§
impl Freeze for TextPreprocessingStep
impl RefUnwindSafe for TextPreprocessingStep
impl Send for TextPreprocessingStep
impl Sync for TextPreprocessingStep
impl Unpin for TextPreprocessingStep
impl UnwindSafe for TextPreprocessingStep
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more