Dataflow-rs
A thread-safe, vertically-scalable async workflow engine for building high-performance data processing pipelines in Rust.
Dataflow-rs is a Rust library for creating high-performance, asynchronous data processing pipelines with built-in thread-safety and vertical scalability. It's designed to maximize CPU utilization through intelligent concurrency management, allowing you to build complex workflows that automatically scale with your hardware. Whether you're building REST APIs, processing Kafka streams, or creating sophisticated data transformation pipelines, Dataflow-rs provides enterprise-grade performance out of the box.
🚀 Key Features
- Thread-Safe & Scalable: Built-in concurrency management with automatic vertical scaling to utilize all available CPU cores.
- Zero-Cost Workflow Updates: Arc-Swap architecture allows lock-free reads and atomic workflow updates.
- Intelligent Resource Pooling: DataLogic instance pooling eliminates contention and maximizes throughput.
- Asynchronous by Design: Built on Tokio for non-blocking, high-performance concurrent processing.
- Dynamic Workflows: Use JSONLogic to control workflow execution based on your data.
- Extensible: Easily add your own custom processing steps (tasks) to the engine.
- Built-in Functions: Comes with thread-safe implementations of HTTP requests, data mapping, and validation.
- Resilient: Built-in error handling and retry mechanisms to handle transient failures.
- Auditing: Keep track of all the changes that happen to your data as it moves through the pipeline.
🏁 Getting Started
Here's a quick example to get you up and running.
1. Add to Cargo.toml
[]
= "1.0"
= { = "1.0", = ["full"] }
= "1.0"
2. Create a Workflow
Workflows are defined in JSON and consist of a series of tasks.
3. Run the Engine
use ;
use Message;
use json;
async
4. Concurrent Processing (New in v1.0)
Process multiple messages concurrently with automatic resource management:
use ;
use Message;
use json;
use Arc;
use JoinSet;
async
✨ Core Concepts
- Engine: The heart of the library, now thread-safe with configurable concurrency levels.
- Workflow: A sequence of tasks that are executed in order, stored using Arc-Swap for lock-free reads.
- Task: A single step in a workflow, like making an HTTP request or transforming data.
- Message: The data that flows through the engine, with each message getting its own DataLogic instance.
- Concurrency: Unified concurrency model where pool size matches max concurrent messages to eliminate contention.
⚡ Performance
Dataflow-rs v1.0 introduces significant performance improvements through its unified concurrency model:
- Improved Scalability: Performance scales with available CPU cores
- Zero Contention: Pool size matches concurrent tasks to eliminate resource contention
- Lock-Free Reads: Arc-Swap architecture enables zero-cost workflow reads
- High Throughput: Achieve substantial throughput improvements with increased concurrency
Run the included benchmark to test performance on your hardware:
🛠️ Custom Functions
You can extend the engine with your own custom logic by implementing the AsyncFunctionHandler
trait. Note that in v1.0, functions receive a DataLogic instance for thread-safe JSONLogic evaluation.
use ;
use async_trait;
use Value;
use DataLogic;
;
// Then, register it with the engine:
// engine.register_task_function("my_custom_function", Box::new(MyCustomFunction));
🤝 Contributing
We welcome contributions! Feel free to fork the repository, make your changes, and submit a pull request. Please make sure to add tests for any new features.
🏢 About Plasmatic
Dataflow-rs is developed by the team at Plasmatic. We're passionate about building open-source tools for data processing.
📄 License
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for more details.