RustFrames
A blazing fast, memory-safe alternative to NumPy + Pandas, written in Rust
Overview
RustFrames is the foundational data & array library for Rust, providing high-performance, memory-safe alternatives to NumPy and Pandas. Built from the ground up in Rust, it delivers blazing-fast numerical computing with zero-cost abstractions and fearless concurrency.
Why RustFrames?
- Memory Safety → No segfaults, buffer overflows, or undefined behavior
- Performance First → SIMD acceleration, multithreading, and GPU support
- Seamless Interop → Native Apache Arrow, Parquet, CSV, and NumPy compatibility
- Unified API → One library for both tabular DataFrames and n-dimensional Arrays
- Zero-Cost Abstractions → High-level APIs with no performance overhead
- Rust Ecosystem → Perfect integration with the broader Rust data ecosystem
Features
Arrays (rustframes::array
)
- N-dimensional arrays with efficient memory layout
- SIMD-accelerated operations for maximum performance
- Broadcasting support for NumPy-style operations
- Linear algebra operations (matrix multiplication, decompositions)
- GPU acceleration support (CUDA/ROCm)
- Type safety with compile-time shape checking where possible
DataFrames (rustframes::dataframe
)
- High-performance DataFrames and Series
- Advanced operations: GroupBy, window functions, joins
- Apache Arrow backend for zero-copy interoperability
- Multiple I/O formats: CSV, Parquet, JSON, Arrow IPC
- Memory-efficient columnar storage
- Lazy evaluation for complex query optimization
Performance Features
- SIMD acceleration on x86_64 and ARM64
- Parallel processing with Rayon
- GPU kernels for CUDA and ROCm
- Memory mapping for large datasets
- Vectorized string operations
Installation
Add RustFrames to your Cargo.toml
:
= "1.0"
# Optional features
= { = "1.0", = ["gpu", "arrow", "parquet"] }
Feature Flags
Feature | Description | Default |
---|---|---|
gpu |
Enable CUDA/ROCm GPU acceleration | ❌ |
arrow |
Apache Arrow integration | ✅ |
parquet |
Parquet file format support | ✅ |
simd |
SIMD acceleration | ✅ |
rayon |
Parallel processing | ✅ |
Quick Start
Arrays
use Array;
DataFrames
use DataFrame;
Advanced Examples
GPU-Accelerated Computing
use Array;
Apache Arrow Integration
use DataFrame;
use Int32Array;
API Documentation
Complete API documentation is available at docs.rs/rustframes.
Key Modules
rustframes::array
- N-dimensional arrays and linear algebrarustframes::dataframe
- DataFrames, Series, and tabular operationsrustframes::io
- File I/O for CSV, Parquet, Arrow formatsrustframes::gpu
- GPU acceleration utilitiesrustframes::simd
- SIMD optimization helpers
Ecosystem Integration
RustFrames integrates seamlessly with the Rust data ecosystem:
- Polars - Use RustFrames arrays in Polars expressions
- DataFusion - Query RustFrames DataFrames with SQL
- Candle - Deep learning with RustFrames tensors
- PyO3 - Expose RustFrames to Python
- Wasm-pack - Run RustFrames in the browser
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
Running Benchmarks
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.