TaskFlow-RS
A Rust implementation of TaskFlow — a general-purpose task-parallel programming library with heterogeneous CPU/GPU support.
Features
- ✅ Task Graph Construction — DAGs with flexible dependency management
- ✅ Lock-Free Work-Stealing Executor — Per-worker queues, near-linear scalability
- ✅ Subflows — Nested task graphs for recursive parallelism
- ✅ Condition Tasks — Conditional branching and loop constructs
- ✅ Cycle Detection — Catches cycles at graph construction time
- ✅ Parallel Algorithms —
for_each,reduce,transform,sort,scan - ✅ Async Task Support — Full
async/awaitintegration with Tokio - ✅ Pipeline Support — Stream processing with parallel/serial stages and backpressure
- ✅ Composition — Reusable, parameterized task graph components
- ✅ GPU Support — CUDA, OpenCL, and ROCm/HIP with async transfers and multiple streams
- ✅ Task Priorities — Static
Low / Normal / High / Criticalpriority levels - ✅ Preemptive Cancellation — Watchdog timeouts, RAII deadline guards, signal preemption
- ✅ Dynamic Priority Adjustment — O(log n) live reprioritization of queued tasks
- ✅ Hardware Topology Integration — hwloc2 / sysfs cache hierarchy, NUMA binding, CPU affinity
- ✅ Real-time Dashboard — HTTP server with SSE; live throughput chart and worker utilisation bars
- ✅ Flamegraph Generation — Interactive SVG flamegraphs from profiles or folded stacks
- ✅ Automated Regression Detection — Statistical baseline comparison with JSON persistence
- ✅ Tooling — Profiler, DOT/SVG/HTML visualization, performance monitoring, debug logging
Quick Start
[]
= "0.1.0"
Basic task graph
use ;
Preemptive Cancellation
TaskFlow-RS provides three escalating levels of cancellation, all sharing the same
PreemptiveCancellationToken type.
use PreemptiveCancellationToken;
use Duration;
let token = new;
// Layer 1 — cooperative: task checks the flag manually
token.check?;
// Layer 2 — watchdog: fires automatically after a deadline
token.cancel_after;
// Layer 3 — RAII deadline guard (scope-bound)
let _guard = token.deadline_guard;
Real-time Dashboard
DashboardServer starts an HTTP server that streams live metrics to any browser via
Server-Sent Events. No external JavaScript libraries or CDN required — the page is
entirely self-contained.
use Arc;
use ;
let metrics = new;
metrics.start;
let server = new;
let handle = server.start; // non-blocking
println!;
// … run your executor, record metrics …
handle.stop;
The dashboard provides:
- Live throughput chart (tasks/sec over a rolling 60 s window)
- Worker utilisation bars (colour-coded: green > 80%, yellow > 40%, red otherwise)
- Stat cards — tasks completed, throughput, avg task duration, steal rate
- SSE history replay — new clients immediately receive the full history buffer
Flamegraph Generation
FlamegraphGenerator produces fully interactive SVG flamegraphs with zero external
dependencies. The output opens in any browser and supports click-to-zoom, search/highlight,
and hover tooltips.
From an ExecutionProfile
use ;
let gen = new;
gen.save_from_profile?;
// open flamegraph.svg in any browser
From folded stacks (perf / dtrace compatible)
// Standard "stack;frame N" format — compatible with perf-script | stackcollapse-perf
let folded = read_to_string?;
gen.save_from_folded?;
SVG features:
- Click a frame to zoom into its subtree
- Ctrl+click (or press Reset) to restore the full view
- Search box dims non-matching frames in real time
- Deterministic colours — same frame name always gets the same colour
Automated Regression Detection
RegressionDetector compares a finished ExecutionProfile against a stored Baseline
and produces a structured RegressionReport suitable for CI pipelines.
use ;
// ── First run: record the baseline ──────────────────────────────────────────
let baseline = from_profile;
baseline.save?;
// ── Subsequent runs: detect regressions ─────────────────────────────────────
let baseline = load?;
let detector = new;
let report = detector.detect;
println!;
// Fail CI on any regression
assert!;
// Save JSON artefact for diff tracking
write?;
Threshold presets
| Preset | Total | Avg | P95 | P99 | Notes |
|---|---|---|---|---|---|
default() |
10% | 10% | 15% | 20% | Balanced — recommended starting point |
strict() |
5% | 5% | 8% | 10% | Critical paths; fail on small regressions |
lenient() |
20% | 20% | 30% | 40% | Nightly builds; noise-tolerant |
Violations are classified as WARNING (> threshold) or CRITICAL (> 2× threshold).
GPU Support
TaskFlow-RS provides a backend-agnostic GPU API supporting CUDA (NVIDIA), OpenCL (NVIDIA/AMD/Intel), and ROCm/HIP (AMD).
Building
# CUDA (NVIDIA) — default CUDA 12.0
# OpenCL (NVIDIA / AMD / Intel)
# Ubuntu: sudo apt install ocl-icd-opencl-dev
# ROCm / HIP (AMD) — requires ROCm ≥ 5.0
ROCM_PATH=/opt/rocm
# All backends
# Stub backend (always available, no flags needed)
Usage
use ;
// Auto-select: CUDA → ROCm → OpenCL → Stub
let device = new?;
// Allocate and transfer
let mut buf: = allocate?;
buf.copy_from_host?;
// Async transfer via stream
let stream = device.create_stream?;
unsafe
stream.synchronize?;
Async Support
[]
= { = "0.1", = ["async"] }
= { = "1", = ["full"] }
use ;
async
Tooling
use ;
// Profiling
let profiler = new;
profiler.enable;
executor.run.wait;
let profile = profiler.get_profile.unwrap;
println!;
// DOT / SVG / HTML visualization
generate_dot_graph;
// dot -Tsvg taskflow.dot -o taskflow.svg
// Flamegraph (new)
use ;
new
.save_from_profile?;
// Real-time dashboard (new)
use ;
let handle = new.start;
// → http://localhost:9090
// Regression detection (new)
use ;
let baseline = from_profile;
baseline.save?;
let report = new.detect;
assert!;
Building
# Core (no GPU, no hwloc)
# With hwloc topology
# With CUDA GPU
# Everything
# Release
# Tests
Examples
# New tooling features
# Advanced scheduling
# Advanced features (priorities, cooperative cancellation, NUMA)
# Async
# GPU (stub — no hardware required)
# GPU with hardware
Architecture
Work-stealing executor
Worker 0: [Task] [Task] [Task] ← push/pop own queue (LIFO, cache-warm)
↓ steal (FIFO)
Worker 1: [Task] [Task] ← idle workers steal from busy ones
Scheduling layer
SharedDynamicScheduler
├── index: BTreeMap<(RevPriority, SeqNum), TaskId> O(log n) pop
└── reverse: HashMap<TaskId, (RevPriority, SeqNum)> O(1) lookup
PriorityHandle ──weak──► SharedDynamicScheduler
reprioritize() / cancel() without owning the queue
EscalationPolicy ──tick──► sched.reprioritize(id, bumped_priority)
anti-starvation for Low / Normal tasks
Tooling layer
DashboardServer ──SSE──► browser (live charts, no CDN)
└── DashboardConfig { port, push_interval_ms, history_len, title }
FlamegraphGenerator
├── from_profile(ExecutionProfile) → interactive SVG
└── from_folded("a;b;c N") → compatible with perf / dtrace
RegressionDetector
├── Baseline { total_us, avg_us, P50/P95/P99, efficiency } ← JSON file
└── detect(profile) → RegressionReport { violations, summary, to_json }
Documentation
| Document | Contents |
|---|---|
| DESIGN.md | Architecture, implementation status, design decisions |
| ADVANCED_FEATURES.md | Priorities, cancellation, schedulers, NUMA |
| GPU.md | Full GPU API: backends, streams, async transfers |
| GPU_SETUP.md | CUDA version config, ROCm install, troubleshooting |
| ASYNC_TASKS.md | Async executor and task documentation |
| PIPELINE.md | Concurrent pipeline documentation |
| TOOLING.md | Profiler, visualization, monitoring, dashboard, flamegraphs, regression |
License
MIT