cudf 0.1.0

Safe Rust bindings for NVIDIA libcudf -- GPU-accelerated DataFrame operations
docs.rs failed to build cudf-0.1.0
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: cudf-0.2.0

cudf

Safe Rust bindings for NVIDIA's libcudf -- GPU-accelerated DataFrame operations.

Features

  • Near-zero unsafe public API -- all unsafe is confined to the internal FFI layer, with the sole exception of DLPackTensor::from_raw_ptr
  • Zero-cost ownership -- Column and Table map directly to libcudf's RAII types
  • Compile-time safety -- Rust's borrow checker prevents use-after-free on GPU memory
  • Arrow interop -- zero-copy conversion to/from arrow-rs arrays
  • Full libcudf coverage -- all operations including groupby, join, sort, I/O, strings

Quick Start

use cudf::{Column, Table, Result};

fn main() -> Result<()> {
    // Create GPU columns from host data
    let ids = Column::from_slice(&[1i32, 2, 3, 4, 5])?;
    let values = Column::from_slice(&[10.0f64, 20.0, 30.0, 40.0, 50.0])?;

    // Build a table
    let table = Table::new(vec![ids, values])?;
    assert_eq!(table.num_columns(), 2);
    assert_eq!(table.num_rows(), 5);

    // Read back
    let col = table.column(0)?;
    let data: Vec<i32> = col.to_vec()?;
    assert_eq!(data, vec![1, 2, 3, 4, 5]);

    Ok(())
}

Prerequisites

  • NVIDIA GPU (Volta or newer, compute capability 7.0+)
  • CUDA 12.2+
  • libcudf installed (see cudf-sys for installation instructions)
  • Linux (libcudf does not support macOS or Windows)

Crate Structure

cudf-rs/
├── cudf-sys   -- links libcudf.so (build script only)
├── cudf-cxx   -- cxx-based FFI bridge + C++ shim layer
└── cudf       -- this crate: safe, idiomatic Rust API

Public API Surface

Core Types

Type Description
Column GPU-resident column with typed data and optional null bitmask
Table Ordered collection of Columns (DataFrame equivalent)
Scalar GPU-resident single typed value with validity flag
DataType / TypeId Type system mirroring libcudf
CudfError / Result<T> Unified error handling with C++ exception conversion

Re-exports

pub use Column, CudfType, Table, TableWithMetadata, Scalar;
pub use DataType, TypeId, NullHandling;
pub use CudfError, Result;
pub use SortOrder, NullOrder;
pub use OutOfBoundsPolicy;
pub use UnaryOp, BinaryOp;
pub use DuplicateKeepOption;
pub use AggregationKind, GroupBy, GroupByGroups, GroupByReplacePolicy, GroupByScan, GroupByScanOp;
pub use ReduceOp, ScanOp, MinMaxResult;
pub use Interpolation;
pub use RollingAgg;
pub use JoinResult, HashJoin, SemiJoinResult;
pub use JsonObjectOptions;
pub use PartitionResult;
pub use NullReplacePolicy;
pub use DLPackTensor, PackedTable, SplitResult;

Compute Operations

Function / Method Module Description
table.sort() sorting Sort table by one or more columns
GroupBy::new(&keys).agg(...).execute(&values) groupby Groupby aggregation
col.reduce(op, dtype) reduction Reduce column to scalar
col.scan(op, inclusive) reduction Prefix-sum / scan
col.quantile(q, interp) quantiles Compute quantiles
col.rolling_agg(agg, window, min_periods) rolling Rolling window aggregation
col.binary_op(op, &rhs, out_type) binaryop Element-wise binary ops
col.unary_op(op) unary Element-wise unary ops
col.round(decimals) round Numeric rounding
table.hash(algo) hashing Row-wise hashing
col.extract_year(), .extract_month(), ... datetime Datetime component extraction
table.lower_bound(...), table.upper_bound(...) search Binary search on sorted tables
col.nans_to_nulls() transform NaN-to-null conversion

Data Manipulation

Function / Method Module Description
table.gather(&map) copying Gather rows by index
table.scatter(...) copying Scatter values to indices
table.slice(offset, size) copying Slice a contiguous range
table.split(indices) copying Split at given indices
col.fill(value, begin, end) filling Fill range with value
Table::concatenate(&[tables]) concatenate Vertical stacking
Table::merge(...) merge Merge pre-sorted tables
table.inner_join(...) join Inner / left / full / cross join
table.drop_nulls(...) stream_compaction Drop null rows
table.apply_boolean_mask(...) stream_compaction Filter by boolean mask
table.unique(...) stream_compaction Remove duplicates
table.interleave() reshape Interleave columns
table.transpose() transpose Swap rows and columns
table.hash_partition(...) partitioning Hash / round-robin partition

I/O

Function Module Description
parquet::read_parquet(path) io::parquet Read Parquet file to GPU
parquet::write_parquet(&table, path) io::parquet Write table to Parquet
csv::read_csv(path) io::csv Read CSV file to GPU
csv::write_csv(&table, path) io::csv Write table to CSV
json::read_json(path) io::json Read JSON file to GPU
json::write_json(&table, path) io::json Write table to JSON
orc::read_orc(path) io::orc Read ORC file to GPU
orc::write_orc(&table, path) io::orc Write table to ORC
avro::read_avro(path) io::avro Read Avro file to GPU

String Operations

All string operations are methods on Column (for string-typed columns):

Method Module Description
col.str_to_upper() / str_to_lower() strings::case Case conversion
col.str_find(target) strings::find Find substring position
col.str_contains(target) / str_contains_re(pattern) strings::contains Containment checks
col.str_replace(target, repl) / str_replace_re(...) strings::replace Replacement
col.str_split(delimiter) strings::split Split into columns
col.str_strip(chars) strings::strip Trim leading/trailing chars
col.str_slice(start, stop) strings::slice Substring extraction
col.str_cat(separator) strings::combine Concatenation
col.str_to_integers(dtype) / col.integers_to_str() strings::convert Type conversion
col.str_extract(pattern) strings::extract Regex capture groups
col.str_findall(pattern) strings::findall All regex matches
col.str_like(pattern, escape) strings::like SQL LIKE matching
col.str_pad(width, side, fill) strings::padding Pad strings to width
col.str_partition(delimiter) strings::partition Split at first delimiter
col.str_repeat(count) strings::repeat Repeat each string N times
col.str_reverse() strings::reverse Reverse strings
col.str_split_re(pattern) strings::split_re Regex split
col.str_count_characters() strings::attributes Character/byte counts
col.str_all_characters_of_type(type) strings::char_types Character type checks
col.str_translate(...) strings::translate Character translation
col.str_wrap(width) strings::wrap Word-wrap strings

Arrow Interop

Method Module Description
col.to_arrow_array() interop Export column to arrow::ArrayRef (C Data Interface, preferred)
Column::from_arrow_array(array) interop Import column from arrow::ArrayRef (C Data Interface)
table.to_arrow_batch() interop Export table to arrow::RecordBatch (C Data Interface, preferred)
Table::from_arrow_batch(batch) interop Import table from arrow::RecordBatch (C Data Interface)
col.to_arrow_ipc() interop Export column to Arrow IPC bytes (legacy)
Column::from_arrow_ipc(data) interop Import column from Arrow IPC bytes (legacy)
table.to_arrow_ipc() interop Export table to Arrow IPC bytes (legacy)
Table::from_arrow_ipc(data) interop Import table from Arrow IPC bytes (legacy)
table.to_record_batch() interop Export to RecordBatch via IPC (legacy)
Table::from_record_batch(batch) interop Import from RecordBatch via IPC (legacy)

Feature Flags

Feature Default Description
arrow-interop Yes Zero-copy conversion to/from arrow arrays