# PandRS
[](https://crates.io/crates/pandrs)
[](https://opensource.org/licenses/MIT)
[](https://docs.rs/pandrs)
A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.
> **🚀 Version 0.1.0 - Production Ready**: PandRS is ready for production use with comprehensive quality improvements. With 1334+ tests, zero clippy warnings, enhanced documentation, and optimized performance, PandRS delivers a robust pandas-like experience for Rust developers. Published to crates.io December 2025.
## Overview
PandRS is a comprehensive data manipulation library that brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:
- **Type-safe operations** leveraging Rust's ownership system
- **High-performance computing** through SIMD vectorization and parallel processing
- **Memory-efficient design** with columnar storage and string pooling
- **Comprehensive functionality** matching pandas' core features
- **Seamless interoperability** with Python, Arrow, and various data formats
## Quick Start
```rust
use pandrs::{DataFrame, Series};
use std::collections::HashMap;
// Create a DataFrame
let mut df = DataFrame::new();
df.add_column("name".to_string(),
Series::from_vec(vec!["Alice", "Bob", "Carol"], Some("name")))?;
df.add_column("age".to_string(),
Series::from_vec(vec![30, 25, 35], Some("age")))?;
df.add_column("salary".to_string(),
Series::from_vec(vec![75000.0, 65000.0, 85000.0], Some("salary")))?;
// Perform operations
let filtered = df.filter("age > 25")?;
let mean_salary = df.column("salary")?.mean()?;
let grouped = df.groupby(vec!["department"])?.agg(HashMap::from([
("salary".to_string(), vec!["mean", "sum"]),
("age".to_string(), vec!["max"])
]))?;
```
## Core Features
### Data Structures
- **Series**: One-dimensional labeled array capable of holding any data type
- **DataFrame**: Two-dimensional, size-mutable, heterogeneous tabular data structure
- **MultiIndex**: Hierarchical indexing for advanced data organization
- **Categorical**: Memory-efficient representation for string data with limited cardinality
### Data Types
- Numeric: `i32`, `i64`, `f32`, `f64`, `u32`, `u64`
- String: UTF-8 encoded with automatic string pooling
- Boolean: Native boolean support
- DateTime: Timezone-aware datetime with nanosecond precision
- Categorical: Efficient storage for repeated string values
- Missing Values: First-class `NA` support across all types
### Operations
#### Data Manipulation
- Column addition, removal, and renaming
- Row and column selection with boolean indexing
- Sorting by single or multiple columns
- Duplicate detection and removal
- Data type conversion and casting
#### Aggregation & Grouping
- GroupBy operations with multiple aggregation functions
- Window functions (rolling, expanding, exponentially weighted)
- Pivot tables and cross-tabulation
- Custom aggregation functions
#### Joining & Merging
- Inner, left, right, and outer joins
- Merge on single or multiple keys
- Concat operations with axis control
- Append with automatic index alignment
#### Time Series
- DateTime indexing and slicing
- Resampling and frequency conversion
- Time zone handling and conversion
- Date range generation
- Business day calculations
### Performance Optimizations
#### SIMD Vectorization
- Automatic SIMD optimization for numerical operations
- Hand-tuned implementations for common operations
- Support for AVX2 and AVX-512 instruction sets
#### Parallel Processing
- Multi-threaded execution for large datasets
- Configurable thread pool sizing
- Parallel aggregations and transformations
- Load-balanced work distribution
#### Memory Efficiency
- Columnar storage format
- String interning with global string pool
- Copy-on-write semantics
- Memory-mapped file support
- Lazy evaluation for chain operations
### I/O Capabilities
#### File Formats
- **CSV**: Fast parallel CSV reader/writer
- **Parquet**: Apache Parquet with compression support
- **JSON**: Both records and columnar JSON formats
- **Excel**: XLSX/XLS read/write with multi-sheet support
- **SQL**: Direct database read/write
- **Arrow**: Zero-copy Arrow integration
#### Database Support
- PostgreSQL
- MySQL/MariaDB
- SQLite
- ODBC connectivity
- Connection pooling
#### Cloud Storage
- AWS S3
- Google Cloud Storage
- Azure Blob Storage
- HTTP/HTTPS endpoints
## Installation
Add to your `Cargo.toml`:
```toml
[dependencies]
pandrs = "0.1.0"
```
### Feature Flags
Enable additional functionality with feature flags:
```toml
[dependencies]
pandrs = { version = "0.1.0", features = ["stable"] }
```
Available features:
- **Core features:**
- `stable`: Recommended stable feature set
- `optimized`: Performance optimizations and SIMD
- `backward_compat`: Backward compatibility support
- **Data formats:**
- `parquet`: Parquet file support
- `excel`: Excel file support
- `sql`: Database connectivity
- **Advanced features:**
- `distributed`: Distributed computing with DataFusion
- `visualization`: Plotting capabilities
- `streaming`: Real-time data processing
- `serving`: Model serving and deployment
- **Experimental:**
- `cuda`: GPU acceleration (requires CUDA toolkit)
- `wasm`: WebAssembly compilation support
- `jit`: Just-in-time compilation
- **Feature bundles:**
- `all-safe`: All stable features (recommended)
- `test-safe`: Features safe for testing
## Performance Benchmarks
Performance comparison with pandas (Python) and Polars (Rust):
| CSV Read (1M rows) | 0.18s | 0.92s | 0.15s | 5.1x |
| GroupBy Sum | 0.09s | 0.31s | 0.08s | 3.4x |
| Join Operations | 0.21s | 0.87s | 0.19s | 4.1x |
| String Operations | 0.14s | 1.23s | 0.16s | 8.8x |
| Rolling Window | 0.11s | 0.43s | 0.12s | 3.9x |
*Benchmarks performed on AMD Ryzen 9 5950X, 64GB RAM, NVMe SSD*
## Documentation
- [API Documentation](https://docs.rs/pandrs)
- [User Guide](https://github.com/cool-japan/pandrs/wiki)
- [Examples](https://github.com/cool-japan/pandrs/tree/main/examples)
- [Migration from Pandas](https://github.com/cool-japan/pandrs/wiki/Migration-Guide)
## Examples
### Basic Data Analysis
```rust
use pandrs::prelude::*;
let df = DataFrame::read_csv("data.csv", CsvReadOptions::default())?;
// Basic statistics
let stats = df.describe()?;
println!("Data statistics:\n{}", stats);
// Filtering and aggregation
let result = df
.filter("age >= 18 && income > 50000")?
.groupby(vec!["city", "occupation"])?
.agg(HashMap::from([
("income".to_string(), vec!["mean", "median", "std"]),
("age".to_string(), vec!["mean"])
]))?
.sort_values(vec!["income_mean"], vec![false])?;
```
### Time Series Analysis
```rust
use pandrs::prelude::*;
use chrono::{Duration, Utc};
let mut df = DataFrame::read_csv("timeseries.csv", CsvReadOptions::default())?;
df.set_index("timestamp")?;
// Resample to daily frequency
let daily = df.resample("D")?.mean()?;
// Calculate rolling statistics
let rolling_stats = daily
.rolling(RollingOptions {
window: 7,
min_periods: Some(1),
center: false,
})?
.agg(HashMap::from([
("value".to_string(), vec!["mean", "std"]),
]))?;
// Exponentially weighted moving average
let ewm = daily.ewm(EwmOptions {
span: Some(10.0),
..Default::default()
})?;
```
### Machine Learning Pipeline
```rust
use pandrs::prelude::*;
// Load and preprocess data
let df = DataFrame::read_parquet("features.parquet")?;
// Handle missing values
let df_filled = df.fillna(FillNaOptions::Forward)?;
// Encode categorical variables
let df_encoded = df_filled.get_dummies(vec!["category1", "category2"], None)?;
// Normalize numerical features
let features = vec!["feature1", "feature2", "feature3"];
let df_normalized = df_encoded.apply_columns(&features, |series| {
let mean = series.mean()?;
let std = series.std(1)?;
series.sub_scalar(mean)?.div_scalar(std)
})?;
// Split features and target
let X = df_normalized.drop(vec!["target"])?;
let y = df_normalized.column("target")?;
```
## Contributing
We welcome contributions! Please see our [Contributing Guide](CONTRIBUTING.md) for details.
### Development Setup
```bash
# Clone the repository
git clone https://github.com/cool-japan/pandrs
cd pandrs
# Install development dependencies
cargo install cargo-nextest cargo-criterion
# Run tests
cargo nextest run
# Run benchmarks
cargo criterion
# Check code quality
cargo clippy -- -D warnings
cargo fmt -- --check
```
## License
Licensed under either of:
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or <http://www.apache.org/licenses/LICENSE-2.0>)
- MIT license ([LICENSE-MIT](LICENSE-MIT) or <http://opensource.org/licenses/MIT>)
at your option.
## Acknowledgments
PandRS is inspired by the excellent pandas library and incorporates ideas from:
- [Pandas](https://pandas.pydata.org/) - API design and functionality
- [Polars](https://www.pola.rs/) - Performance optimizations
- [Apache Arrow](https://arrow.apache.org/) - Columnar format
- [DataFusion](https://arrow.apache.org/datafusion/) - Query engine
## Support
- [Issue Tracker](https://github.com/cool-japan/pandrs/issues)
- [Discussions](https://github.com/cool-japan/pandrs/discussions)
- [Stack Overflow](https://stackoverflow.com/questions/tagged/pandrs)
---
PandRS is a Cool Japan project, bringing high-performance data analysis to the Rust ecosystem.