cudf

Safe Rust bindings for NVIDIA's libcudf -- GPU-accelerated DataFrame operations.

Features

Near-zero unsafe public API -- all unsafe is confined to the internal FFI layer, with the sole exception of DLPackTensor::from_raw_ptr
Zero-cost ownership -- Column and Table map directly to libcudf's RAII types
Compile-time safety -- Rust's borrow checker prevents use-after-free on GPU memory
Arrow interop -- zero-copy conversion to/from arrow-rs arrays
Full libcudf coverage -- all operations including groupby, join, sort, I/O, strings

Quick Start

use cudf::{Column, Table, Result};

fn main() -> Result<()> {
    // Create GPU columns from host data
    let ids = Column::from_slice(&[1i32, 2, 3, 4, 5])?;
    let values = Column::from_slice(&[10.0f64, 20.0, 30.0, 40.0, 50.0])?;

    // Build a table
    let table = Table::new(vec![ids, values])?;
    assert_eq!(table.num_columns(), 2);
    assert_eq!(table.num_rows(), 5);

    // Read back
    let col = table.column(0)?;
    let data: Vec<i32> = col.to_vec()?;
    assert_eq!(data, vec![1, 2, 3, 4, 5]);

    Ok(())
}

Prerequisites

NVIDIA GPU (Volta or newer, compute capability 7.0+)
CUDA 12.2+
libcudf installed (see cudf-sys for installation instructions)
Linux (libcudf does not support macOS or Windows)

Crate Structure

cudf-rs/
├── cudf-sys   -- links libcudf.so (build script only)
├── cudf-cxx   -- cxx-based FFI bridge + C++ shim layer
└── cudf       -- this crate: safe, idiomatic Rust API

Public API Surface

Core Types

Type	Description
`Column`	GPU-resident column with typed data and optional null bitmask
`Table`	Ordered collection of `Column`s (DataFrame equivalent)
`Scalar`	GPU-resident single typed value with validity flag
`DataType` / `TypeId`	Type system mirroring libcudf
`CudfError` / `Result<T>`	Unified error handling with C++ exception conversion

Re-exports

pub use Column, CudfType, Table, TableWithMetadata, Scalar;
pub use DataType, TypeId, NullHandling;
pub use CudfError, Result;
pub use SortOrder, NullOrder;
pub use OutOfBoundsPolicy;
pub use UnaryOp, BinaryOp;
pub use DuplicateKeepOption;
pub use AggregationKind, GroupBy, GroupByGroups, GroupByReplacePolicy, GroupByScan, GroupByScanOp;
pub use ReduceOp, ScanOp, MinMaxResult;
pub use Interpolation;
pub use RollingAgg;
pub use JoinResult, HashJoin, SemiJoinResult;
pub use JsonObjectOptions;
pub use PartitionResult;
pub use NullReplacePolicy;
pub use DLPackTensor, PackedTable, SplitResult;

Compute Operations

Function / Method	Module	Description
`table.sort()`	`sorting`	Sort table by one or more columns
`GroupBy::new(&keys).agg(...).execute(&values)`	`groupby`	Groupby aggregation
`col.reduce(op, dtype)`	`reduction`	Reduce column to scalar
`col.scan(op, inclusive)`	`reduction`	Prefix-sum / scan
`col.quantile(q, interp)`	`quantiles`	Compute quantiles
`col.rolling_agg(agg, window, min_periods)`	`rolling`	Rolling window aggregation
`col.binary_op(op, &rhs, out_type)`	`binaryop`	Element-wise binary ops
`col.unary_op(op)`	`unary`	Element-wise unary ops
`col.round(decimals)`	`round`	Numeric rounding
`table.hash(algo)`	`hashing`	Row-wise hashing
`col.extract_year()`, `.extract_month()`, ...	`datetime`	Datetime component extraction
`table.lower_bound(...)`, `table.upper_bound(...)`	`search`	Binary search on sorted tables
`col.nans_to_nulls()`	`transform`	NaN-to-null conversion

Data Manipulation

Function / Method	Module	Description
`table.gather(&map)`	`copying`	Gather rows by index
`table.scatter(...)`	`copying`	Scatter values to indices
`table.slice(offset, size)`	`copying`	Slice a contiguous range
`table.split(indices)`	`copying`	Split at given indices
`col.fill(value, begin, end)`	`filling`	Fill range with value
`Table::concatenate(&[tables])`	`concatenate`	Vertical stacking
`Table::merge(...)`	`merge`	Merge pre-sorted tables
`table.inner_join(...)`	`join`	Inner / left / full / cross join
`table.drop_nulls(...)`	`stream_compaction`	Drop null rows
`table.apply_boolean_mask(...)`	`stream_compaction`	Filter by boolean mask
`table.unique(...)`	`stream_compaction`	Remove duplicates
`table.interleave()`	`reshape`	Interleave columns
`table.transpose()`	`transpose`	Swap rows and columns
`table.hash_partition(...)`	`partitioning`	Hash / round-robin partition

I/O

Function	Module	Description
`parquet::read_parquet(path)`	`io::parquet`	Read Parquet file to GPU
`parquet::write_parquet(&table, path)`	`io::parquet`	Write table to Parquet
`csv::read_csv(path)`	`io::csv`	Read CSV file to GPU
`csv::write_csv(&table, path)`	`io::csv`	Write table to CSV
`json::read_json(path)`	`io::json`	Read JSON file to GPU
`json::write_json(&table, path)`	`io::json`	Write table to JSON
`orc::read_orc(path)`	`io::orc`	Read ORC file to GPU
`orc::write_orc(&table, path)`	`io::orc`	Write table to ORC
`avro::read_avro(path)`	`io::avro`	Read Avro file to GPU

String Operations

All string operations are methods on Column (for string-typed columns):

Method	Module	Description
`col.str_to_upper()` / `str_to_lower()`	`strings::case`	Case conversion
`col.str_find(target)`	`strings::find`	Find substring position
`col.str_contains(target)` / `str_contains_re(pattern)`	`strings::contains`	Containment checks
`col.str_replace(target, repl)` / `str_replace_re(...)`	`strings::replace`	Replacement
`col.str_split(delimiter)`	`strings::split`	Split into columns
`col.str_strip(chars)`	`strings::strip`	Trim leading/trailing chars
`col.str_slice(start, stop)`	`strings::slice`	Substring extraction
`col.str_cat(separator)`	`strings::combine`	Concatenation
`col.str_to_integers(dtype)` / `col.integers_to_str()`	`strings::convert`	Type conversion
`col.str_extract(pattern)`	`strings::extract`	Regex capture groups
`col.str_findall(pattern)`	`strings::findall`	All regex matches
`col.str_like(pattern, escape)`	`strings::like`	SQL LIKE matching
`col.str_pad(width, side, fill)`	`strings::padding`	Pad strings to width
`col.str_partition(delimiter)`	`strings::partition`	Split at first delimiter
`col.str_repeat(count)`	`strings::repeat`	Repeat each string N times
`col.str_reverse()`	`strings::reverse`	Reverse strings
`col.str_split_re(pattern)`	`strings::split_re`	Regex split
`col.str_count_characters()`	`strings::attributes`	Character/byte counts
`col.str_all_characters_of_type(type)`	`strings::char_types`	Character type checks
`col.str_translate(...)`	`strings::translate`	Character translation
`col.str_wrap(width)`	`strings::wrap`	Word-wrap strings

Arrow Interop

Method	Module	Description
`col.to_arrow_array()`	`interop`	Export column to `arrow::ArrayRef` (C Data Interface, preferred)
`Column::from_arrow_array(array)`	`interop`	Import column from `arrow::ArrayRef` (C Data Interface)
`table.to_arrow_batch()`	`interop`	Export table to `arrow::RecordBatch` (C Data Interface, preferred)
`Table::from_arrow_batch(batch)`	`interop`	Import table from `arrow::RecordBatch` (C Data Interface)
`col.to_arrow_ipc()`	`interop`	Export column to Arrow IPC bytes (legacy)
`Column::from_arrow_ipc(data)`	`interop`	Import column from Arrow IPC bytes (legacy)
`table.to_arrow_ipc()`	`interop`	Export table to Arrow IPC bytes (legacy)
`Table::from_arrow_ipc(data)`	`interop`	Import table from Arrow IPC bytes (legacy)
`table.to_record_batch()`	`interop`	Export to `RecordBatch` via IPC (legacy)
`Table::from_record_batch(batch)`	`interop`	Import from `RecordBatch` via IPC (legacy)

Feature Flags

Feature	Default	Description
`arrow-interop`	Yes	Zero-copy conversion to/from `arrow` arrays

cudf 0.2.0

cudf

Features

Quick Start

Prerequisites

Crate Structure

Public API Surface

Core Types

Re-exports

Compute Operations

Data Manipulation

I/O

String Operations

Arrow Interop

Feature Flags