data-proc
A lightweight, composable data processing pipeline framework for Rust with async support.
Overview
data-proc provides a simple yet powerful abstraction for building data processing pipelines using Rust's async streams. The library is built around three core traits:
- Source: Produces a stream of data items
- Transformer: Processes a stream of input items into a stream of output items
- Sink: Consumes a stream of items
This design allows for composable, reusable components that can be combined to build complex data processing workflows.
Features
- 🔄 Stream-based: Built on top of Rust's
futures::Streamfor efficient async processing - 🧩 Composable: Mix and match sources, transformers, and sinks to build custom pipelines
- 🚀 Concurrent: Easily parallelize data processing with async/await
- 🔌 Extensible: Implement the traits for your own types to integrate with existing components
Installation
Add data-proc to your Cargo.toml:
[]
= "0.1.0"
Usage
Basic Example
use ;
use stream;
use StreamExt;
// Define a simple source that produces numbers
;
// Define a transformer that doubles each number
;
// Define a sink that prints each number
;
// Use them together in a pipeline
async
Real-world Example
Check out the parquet-elasticsearch example for a more complex use case that:
- Reads Parquet files from a directory
- Transforms the data into Elasticsearch bulk API format
- Sends the data to an Elasticsearch cluster
Core Traits
Source
Implement this trait for types that produce data. The into_stream method consumes the source and returns a stream of items.
Transformer
Implement this trait for types that transform data. The proc method takes a stream of input items and returns a future that resolves to a stream of output items.
Sink
Implement this trait for types that consume data. The sink method takes a stream of items and returns a future that resolves to a result indicating success or failure.
Common Patterns
Parallel Processing
You can easily parallelize processing using the buffer_unordered and for_each_concurrent methods from the futures crate:
Error Handling
For robust error handling, you can use the Result type in your stream items:
License
This project is licensed under the MIT License - see the LICENSE file for details.