cudf-polars
GPU execution engine for Polars DataFrames using NVIDIA libcudf.
cudf-polars transparently offloads Polars DataFrame operations to the GPU,
providing significant speedups for filter, sort, groupby, join, and other
data-intensive operations.
Prerequisites
- NVIDIA GPU with CUDA support
- CUDA Toolkit 12.x
- libcudf (built from
cudf-sys/cudf-cxxin this workspace) - Rust 1.85+
Quick Start
use *;
use *;
use collect_gpu;
Supported Operations
| Category | Operation | API | Status |
|---|---|---|---|
| Transfer | CPU -> GPU | GpuDataFrame::from_polars() |
Done |
| GPU -> CPU | GpuDataFrame::to_polars() |
Done | |
| Selection | Column select | GpuDataFrame::select_columns() |
Done |
| Row slice | GpuDataFrame::slice() |
Done | |
| Filter | Boolean mask | GpuDataFrame::apply_boolean_mask() |
Done |
| Sort | Sort by key columns | GpuDataFrame::sort_by_key() |
Done |
| GroupBy | Aggregation | GpuDataFrame::groupby() |
Done |
| Dedup | Distinct rows | GpuDataFrame::distinct() |
Done |
| Join | Inner/Left/Full | Table::inner_join() etc. |
Done |
| Semi/Anti/Cross | Table::left_semi_join() etc. |
Done | |
| Union | Vertical concat | concatenate_tables() |
Done |
| HConcat | Horizontal concat | Column collection | Done |
| Binary Ops | Column-column | Column::binary_op() |
Done |
| Column-scalar | Column::binary_op_scalar() |
Done | |
| Ternary | when/then/otherwise | Column::copy_if_else() |
Done |
| Expression | Polars expr -> GPU | cudf_polars::expr |
Done |
| Plan Exec | Full plan execution | cudf_polars::execute_plan() |
Done |
Supported Aggregations (GroupBy)
Sum, Min, Max, Count, Mean, Median, Variance, Std, Nunique, First, Last, Quantile.
Supported Data Types
| Polars Type | cudf Type |
|---|---|
| Int8 | INT8 |
| Int16 | INT16 |
| Int32 | INT32 |
| Int64 | INT64 |
| UInt8 | UINT8 |
| UInt16 | UINT16 |
| UInt32 | UINT32 |
| UInt64 | UINT64 |
| Float32 | FLOAT32 |
| Float64 | FLOAT64 |
| Boolean | BOOL8 |
| String | STRING |
| Date | TIMESTAMP_DAYS |
| Datetime | TIMESTAMP_{MS,US,NS} |
| Duration | DURATION_{MS,US,NS} |
Benchmark
Architecture
Polars DataFrame
|
v (Arrow C Data Interface)
cudf-polars::convert -- zero-copy CPU <-> GPU bridge
|
v
cudf-polars::GpuDataFrame -- named GPU columns
|
v
cudf (Rust) -> cudf-cxx (C++ bridge) -> libcudf (NVIDIA)
execute_plan() takes an IRPlan obtained from polars-lazy's to_alp_optimized().
LazyFrame Integration
use LazyFrame;
use execute_plan;
let lf: LazyFrame = df.lazy.filter.group_by;
let plan = lf.to_alp_optimized?;
let result = execute_plan?;
Testing
# Run GPU e2e tests (56 tests + 1 doctest)
# Python polars-gpu integration (81 tests)
Limitations
- Polars version: Compatible with polars 0.53.0.
- Unsupported types: Categorical, List, Struct return explicit errors.
- Unsupported expressions: Window functions with
order_by, expression-level Sort/Filter/Slice. - Unsupported IR nodes:
Cache,MapFunction(rename, explode, melt),ExtContext. - Multi-file Parquet: Only reads the first file in multi-file scans.
- GroupBy maintain_order: Approximated by key-column sort (not true input-order preservation).
License
Apache-2.0 OR MIT