PandRS
A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.
🚀 Version 0.1.0 - Production Ready: PandRS is ready for production use with comprehensive quality improvements. With 1334+ tests, zero clippy warnings, enhanced documentation, and optimized performance, PandRS delivers a robust pandas-like experience for Rust developers. Published to crates.io December 2025.
Overview
PandRS is a comprehensive data manipulation library that brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:
- Type-safe operations leveraging Rust's ownership system
- High-performance computing through SIMD vectorization and parallel processing
- Memory-efficient design with columnar storage and string pooling
- Comprehensive functionality matching pandas' core features
- Seamless interoperability with Python, Arrow, and various data formats
Quick Start
use ;
use HashMap;
// Create a DataFrame
let mut df = new;
df.add_column?;
df.add_column?;
df.add_column?;
// Perform operations
let filtered = df.filter?;
let mean_salary = df.column?.mean?;
let grouped = df.groupby?.agg?;
Core Features
Data Structures
- Series: One-dimensional labeled array capable of holding any data type
- DataFrame: Two-dimensional, size-mutable, heterogeneous tabular data structure
- MultiIndex: Hierarchical indexing for advanced data organization
- Categorical: Memory-efficient representation for string data with limited cardinality
Data Types
- Numeric:
i32,i64,f32,f64,u32,u64 - String: UTF-8 encoded with automatic string pooling
- Boolean: Native boolean support
- DateTime: Timezone-aware datetime with nanosecond precision
- Categorical: Efficient storage for repeated string values
- Missing Values: First-class
NAsupport across all types
Operations
Data Manipulation
- Column addition, removal, and renaming
- Row and column selection with boolean indexing
- Sorting by single or multiple columns
- Duplicate detection and removal
- Data type conversion and casting
Aggregation & Grouping
- GroupBy operations with multiple aggregation functions
- Window functions (rolling, expanding, exponentially weighted)
- Pivot tables and cross-tabulation
- Custom aggregation functions
Joining & Merging
- Inner, left, right, and outer joins
- Merge on single or multiple keys
- Concat operations with axis control
- Append with automatic index alignment
Time Series
- DateTime indexing and slicing
- Resampling and frequency conversion
- Time zone handling and conversion
- Date range generation
- Business day calculations
Performance Optimizations
SIMD Vectorization
- Automatic SIMD optimization for numerical operations
- Hand-tuned implementations for common operations
- Support for AVX2 and AVX-512 instruction sets
Parallel Processing
- Multi-threaded execution for large datasets
- Configurable thread pool sizing
- Parallel aggregations and transformations
- Load-balanced work distribution
Memory Efficiency
- Columnar storage format
- String interning with global string pool
- Copy-on-write semantics
- Memory-mapped file support
- Lazy evaluation for chain operations
I/O Capabilities
File Formats
- CSV: Fast parallel CSV reader/writer
- Parquet: Apache Parquet with compression support
- JSON: Both records and columnar JSON formats
- Excel: XLSX/XLS read/write with multi-sheet support
- SQL: Direct database read/write
- Arrow: Zero-copy Arrow integration
Database Support
- PostgreSQL
- MySQL/MariaDB
- SQLite
- ODBC connectivity
- Connection pooling
Cloud Storage
- AWS S3
- Google Cloud Storage
- Azure Blob Storage
- HTTP/HTTPS endpoints
Installation
Add to your Cargo.toml:
[]
= "0.1.0"
Feature Flags
Enable additional functionality with feature flags:
[]
= { = "0.1.0", = ["stable"] }
Available features:
- Core features:
stable: Recommended stable feature setoptimized: Performance optimizations and SIMDbackward_compat: Backward compatibility support
- Data formats:
parquet: Parquet file supportexcel: Excel file supportsql: Database connectivity
- Advanced features:
distributed: Distributed computing with DataFusionvisualization: Plotting capabilitiesstreaming: Real-time data processingserving: Model serving and deployment
- Experimental:
cuda: GPU acceleration (requires CUDA toolkit)wasm: WebAssembly compilation supportjit: Just-in-time compilation
- Feature bundles:
all-safe: All stable features (recommended)test-safe: Features safe for testing
Performance Benchmarks
Performance comparison with pandas (Python) and Polars (Rust):
| Operation | PandRS | Pandas | Polars | Speedup vs Pandas |
|---|---|---|---|---|
| CSV Read (1M rows) | 0.18s | 0.92s | 0.15s | 5.1x |
| GroupBy Sum | 0.09s | 0.31s | 0.08s | 3.4x |
| Join Operations | 0.21s | 0.87s | 0.19s | 4.1x |
| String Operations | 0.14s | 1.23s | 0.16s | 8.8x |
| Rolling Window | 0.11s | 0.43s | 0.12s | 3.9x |
Benchmarks performed on AMD Ryzen 9 5950X, 64GB RAM, NVMe SSD
Documentation
Examples
Basic Data Analysis
use *;
let df = read_csv?;
// Basic statistics
let stats = df.describe?;
println!;
// Filtering and aggregation
let result = df
.filter?
.groupby?
.agg?
.sort_values?;
Time Series Analysis
use *;
use ;
let mut df = read_csv?;
df.set_index?;
// Resample to daily frequency
let daily = df.resample?.mean?;
// Calculate rolling statistics
let rolling_stats = daily
.rolling?
.agg?;
// Exponentially weighted moving average
let ewm = daily.ewm?;
Machine Learning Pipeline
use *;
// Load and preprocess data
let df = read_parquet?;
// Handle missing values
let df_filled = df.fillna?;
// Encode categorical variables
let df_encoded = df_filled.get_dummies?;
// Normalize numerical features
let features = vec!;
let df_normalized = df_encoded.apply_columns?;
// Split features and target
let X = df_normalized.drop?;
let y = df_normalized.column?;
Contributing
We welcome contributions! Please see our Contributing Guide for details.
Development Setup
# Clone the repository
# Install development dependencies
# Run tests
# Run benchmarks
# Check code quality
License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
Acknowledgments
PandRS is inspired by the excellent pandas library and incorporates ideas from:
- Pandas - API design and functionality
- Polars - Performance optimizations
- Apache Arrow - Columnar format
- DataFusion - Query engine
Support
PandRS is a Cool Japan project, bringing high-performance data analysis to the Rust ecosystem.