Expand description
§PandRS
A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.
§Overview
PandRS brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:
- Type-safe operations leveraging Rust’s ownership system
- High-performance computing through SIMD vectorization and parallel processing
- Memory-efficient design with columnar storage and string pooling
- Comprehensive functionality matching pandas’ core features
- Seamless interoperability with Python, Arrow, and various data formats
§Quick Start
use pandrs::{DataFrame, Series};
// Create a DataFrame
let mut df = DataFrame::new();
df.add_column("name".to_string(),
Series::new(vec!["Alice", "Bob", "Carol"], Some("name".to_string())).unwrap()).unwrap();
df.add_column("age".to_string(),
Series::new(vec![30i64, 25, 35], Some("age".to_string())).unwrap()).unwrap();
// Basic operations
let nrows = df.row_count();
let ncols = df.column_count();§Feature Flags
PandRS supports various feature flags for optional functionality:
-
Core features:
stable: Recommended stable feature setoptimized: Performance optimizations and SIMDbackward_compat: Backward compatibility support
-
Data formats:
parquet: Apache Parquet file supportexcel: Excel file support (read/write)sql: Database connectivity (PostgreSQL, MySQL, SQLite)
-
Advanced features:
distributed: Distributed computing with DataFusionvisualization: Plotting capabilitiesstreaming: Real-time data processingserving: Model serving and deployment
-
Experimental:
cuda: GPU acceleration (requires CUDA toolkit)wasm: WebAssembly compilation supportjit: Just-in-time compilation
§Core Data Structures
Series: One-dimensional labeled array capable of holding any data typeDataFrame: Two-dimensional, size-mutable, heterogeneous tabular data structureMultiIndex: Hierarchical indexing for advanced data organizationCategorical: Memory-efficient representation for string data with limited cardinality
§Modules
dataframe: DataFrame operations and manipulationseries: Series operations and manipulationstats: Statistical functions and analysisml: Machine learning algorithms and utilitiesio: Input/output operations for various file formatsstreaming: Real-time streaming data processingtime_series: Time series analysis and forecastinggraph: Graph analytics and algorithms
§Version
Current version: 0.1.0
Re-exports§
pub use core::column::BitMask as CoreBitMask;pub use core::column::Column as CoreColumn;pub use core::column::ColumnCast;pub use core::column::ColumnTrait;pub use core::column::ColumnType as CoreColumnType;pub use core::data_value::DataValue;pub use core::data_value::DataValueExt;pub use core::data_value::DisplayExt;pub use core::error::Error;pub use core::error::Result;pub use core::index::Index as CoreIndex;pub use core::index::IndexTrait;pub use core::multi_index::MultiIndex as CoreMultiIndex;pub use config::credentials::CredentialBuilder;pub use config::credentials::CredentialMetadata;pub use config::credentials::CredentialStore;pub use config::credentials::CredentialStoreConfig;pub use config::credentials::CredentialType;pub use config::credentials::EncryptedCredential;pub use config::AccessControlConfig;pub use config::AuditConfig;pub use config::AwsConfig;pub use config::AzureConfig;pub use config::CachingConfig;pub use config::CloudConfig;pub use config::ConnectionPoolConfig;pub use config::DatabaseConfig;pub use config::EncryptionConfig;pub use config::GcpConfig;pub use config::GlobalCloudConfig;pub use config::JitConfig;pub use config::LogRotationConfig;pub use config::LoggingConfig;pub use config::MemoryConfig;pub use config::PandRSConfig;pub use config::PerformanceConfig;pub use config::SecurityConfig;pub use config::SslConfig;pub use config::ThreadingConfig;pub use config::TimeoutConfig;pub use column::BooleanColumn;pub use column::Column;pub use column::ColumnType;pub use column::Float64Column;pub use column::Int64Column;pub use column::StringColumn;pub use dataframe::DataFrame;pub use dataframe::MeltOptions;pub use dataframe::StackOptions;pub use dataframe::UnstackOptions;pub use error::PandRSError;pub use groupby::GroupBy;pub use index::DataFrameIndex;pub use index::Index;pub use index::IndexTrait as LegacyIndexTrait;pub use index::MultiIndex;pub use index::RangeIndex;pub use index::StringIndex;pub use index::StringMultiIndex;pub use na::NA;pub use optimized::AggregateOp;pub use optimized::JoinType;pub use optimized::LazyFrame;pub use optimized::OptimizedDataFrame;pub use parallel::ParallelUtils;pub use series::Categorical;pub use series::CategoricalOrder;pub use series::NASeries;pub use series::Series;pub use series::StringCategorical;pub use stats::DescriptiveStats;pub use stats::LinearRegressionResult;pub use stats::TTestResult;pub use vis::OutputFormat;pub use vis::PlotConfig;pub use vis::PlotType;pub use jupyter::get_jupyter_config;pub use jupyter::init_jupyter;pub use jupyter::jupyter_dark_mode;pub use jupyter::jupyter_light_mode;pub use jupyter::set_jupyter_config;pub use jupyter::JupyterColorScheme;pub use jupyter::JupyterConfig;pub use jupyter::JupyterDisplay;pub use jupyter::JupyterMagics;pub use jupyter::TableStyle;pub use jupyter::TableWidth;pub use ml::anomaly::IsolationForest;pub use ml::anomaly::LocalOutlierFactor;pub use ml::anomaly::OneClassSVM;pub use ml::clustering::AgglomerativeClustering;pub use ml::clustering::DistanceMetric;pub use ml::clustering::KMeans;pub use ml::clustering::Linkage;pub use ml::clustering::DBSCAN;pub use ml::dimension::TSNEInit;pub use ml::dimension::PCA;pub use ml::dimension::TSNE;pub use ml::metrics::classification::accuracy_score;pub use ml::metrics::classification::f1_score;pub use ml::metrics::classification::precision_score;pub use ml::metrics::classification::recall_score;pub use ml::metrics::regression::explained_variance_score;pub use ml::metrics::regression::mean_absolute_error;pub use ml::metrics::regression::mean_squared_error;pub use ml::metrics::regression::r2_score;pub use ml::metrics::regression::root_mean_squared_error;pub use ml::models::ensemble::GradientBoostingClassifier;pub use ml::models::ensemble::GradientBoostingConfig;pub use ml::models::ensemble::GradientBoostingRegressor;pub use ml::models::ensemble::RandomForestClassifier;pub use ml::models::ensemble::RandomForestConfig;pub use ml::models::ensemble::RandomForestRegressor;pub use ml::models::linear::LinearRegression;pub use ml::models::linear::LogisticRegression;pub use ml::models::neural::Activation;pub use ml::models::neural::LossFunction;pub use ml::models::neural::MLPClassifier;pub use ml::models::neural::MLPConfig;pub use ml::models::neural::MLPConfigBuilder;pub use ml::models::neural::MLPRegressor;pub use ml::models::tree::DecisionTreeClassifier;pub use ml::models::tree::DecisionTreeConfig;pub use ml::models::tree::DecisionTreeRegressor;pub use ml::models::tree::SplitCriterion;pub use ml::models::train_test_split;pub use ml::models::CrossValidation;pub use ml::models::ModelEvaluator;pub use ml::models::ModelMetrics;pub use ml::models::SupervisedModel;pub use ml::models::UnsupervisedModel;pub use ml::pipeline::Pipeline;pub use ml::pipeline::PipelineStage;pub use ml::pipeline::PipelineTransformer;pub use ml::preprocessing::Binner;pub use ml::preprocessing::FeatureSelector;pub use ml::preprocessing::ImputeStrategy;pub use ml::preprocessing::Imputer;pub use ml::preprocessing::MinMaxScaler;pub use ml::preprocessing::OneHotEncoder;pub use ml::preprocessing::PolynomialFeatures;pub use ml::preprocessing::StandardScaler;pub use large::ChunkedDataFrame;pub use large::DiskBasedDataFrame;pub use large::DiskBasedOptimizedDataFrame;pub use large::DiskConfig;pub use streaming::AggregationType;pub use streaming::BackpressureBuffer;pub use streaming::BackpressureChannel;pub use streaming::BackpressureConfig;pub use streaming::BackpressureConfigBuilder;pub use streaming::BackpressureStats;pub use streaming::BackpressureStrategy;pub use streaming::DataStream;pub use streaming::FlowController;pub use streaming::MetricType;pub use streaming::MultiColumnAggregator;pub use streaming::RealTimeAnalytics;pub use streaming::StreamAggregator;pub use streaming::StreamConfig;pub use streaming::StreamConnector;pub use streaming::StreamProcessor;pub use streaming::StreamRecord;pub use streaming::TimeWindow;pub use streaming::WindowAggregation;pub use streaming::WindowConfig;pub use streaming::WindowConfigBuilder;pub use streaming::WindowResult;pub use streaming::WindowType;pub use streaming::WindowedAggregator;pub use time_series::ArimaForecaster;pub use time_series::AugmentedDickeyFullerTest;pub use time_series::AutoArima;pub use time_series::AutocorrelationAnalysis;pub use time_series::ChangePointDetection;pub use time_series::DateTimeIndex;pub use time_series::DecompositionMethod;pub use time_series::DecompositionResult;pub use time_series::Differencing;pub use time_series::ExponentialSmoothingForecaster;pub use time_series::FeatureSet;pub use time_series::ForecastMetrics;pub use time_series::ForecastResult;pub use time_series::Forecaster;pub use time_series::Frequency;pub use time_series::KwiatkowskiPhillipsSchmidtShinTest;pub use time_series::LinearTrendForecaster;pub use time_series::MissingValueStrategy;pub use time_series::ModelSelectionCriterion;pub use time_series::ModelSelectionResult;pub use time_series::Normalization;pub use time_series::OutlierDetection;pub use time_series::SarimaForecaster;pub use time_series::SeasonalDecomposition;pub use time_series::SeasonalTest;pub use time_series::SeasonalityAnalysis;pub use time_series::SimpleMovingAverageForecaster;pub use time_series::StationarityTest;pub use time_series::StatisticalFeatures;pub use time_series::TimePoint;pub use time_series::TimeSeries;pub use time_series::TimeSeriesBuilder;pub use time_series::TimeSeriesFeatureExtractor;pub use time_series::TimeSeriesPreprocessor;pub use time_series::TimeSeriesStats;pub use time_series::TrendAnalysis;pub use time_series::WhiteNoiseTest;pub use time_series::WindowFeatures;pub use compute::lazy::LazyFrame as ComputeLazyFrame;pub use compute::parallel::ParallelUtils as ComputeParallelUtils;pub use storage::column_store::ColumnStore;pub use storage::disk::DiskStorage;pub use storage::memory_mapped::MemoryMappedFile;pub use storage::string_pool::StringPool as StorageStringPool;pub use distributed::core::DistributedConfig;pub use distributed::core::DistributedDataFrame;pub use distributed::core::ToDistributed;pub use distributed::execution::ExecutionContext;pub use distributed::execution::ExecutionEngine;pub use distributed::execution::ExecutionPlan;pub use graph::bellman_ford_default;pub use graph::betweenness_centrality;pub use graph::bfs;pub use graph::closeness_centrality;pub use graph::connected_components;pub use graph::degree_centrality;pub use graph::dfs;pub use graph::dijkstra;pub use graph::dijkstra_default;pub use graph::eigenvector_centrality_default;pub use graph::floyd_warshall_default;pub use graph::from_adjacency_matrix;pub use graph::from_edge_dataframe;pub use graph::has_cycle;pub use graph::hits_default;pub use graph::is_connected;pub use graph::label_propagation;pub use graph::louvain_default;pub use graph::modularity;pub use graph::pagerank;pub use graph::pagerank_default;pub use graph::shortest_path_bfs;pub use graph::strongly_connected_components;pub use graph::to_adjacency_matrix;pub use graph::to_edge_dataframe;pub use graph::topological_sort;pub use graph::AllPairsShortestPaths;pub use graph::BfsResult;pub use graph::ComponentResult;pub use graph::DfsResult;pub use graph::Edge;pub use graph::EdgeId;pub use graph::Graph;pub use graph::GraphBuilder;pub use graph::GraphError;pub use graph::GraphType;pub use graph::Node;pub use graph::NodeId;pub use graph::ShortestPathResult;pub use versioning::DataFrameVersioning;pub use versioning::DataSchema;pub use versioning::DataVersion;pub use versioning::LineageConfig;pub use versioning::LineageTracker;pub use versioning::Operation;pub use versioning::OperationType;pub use versioning::TrackerStats;pub use versioning::VersionDiff;pub use versioning::VersionId;pub use versioning::VersionedTransform;pub use versioning::VersioningError;pub use audit::global_logger;pub use audit::init_global_logger;pub use audit::log_global;pub use audit::AuditConfig as AuditLogConfig;pub use audit::AuditConfigBuilder as AuditLogConfigBuilder;pub use audit::AuditEntry;pub use audit::AuditLogger;pub use audit::AuditStats;pub use audit::EventCategory;pub use audit::LogContext;pub use audit::LogDestination;pub use audit::LogLevel;pub use multitenancy::DatasetId;pub use multitenancy::DatasetMetadata;pub use multitenancy::IsolationContext;pub use multitenancy::Permission;pub use multitenancy::ResourceQuota;pub use multitenancy::TenantAuditEntry;pub use multitenancy::TenantConfig;pub use multitenancy::TenantId;pub use multitenancy::TenantManager;pub use multitenancy::TenantOperation;pub use multitenancy::TenantUsage;pub use auth::decode_jwt;pub use auth::encode_jwt;pub use auth::get_token_expiration;pub use auth::is_token_expired;pub use auth::verify_jwt;pub use auth::ApiKeyInfo;pub use auth::ApiKeyManager;pub use auth::ApiKeyStats;pub use auth::AuthEvent;pub use auth::AuthEventType;pub use auth::AuthManager;pub use auth::AuthMethod;pub use auth::AuthResult;pub use auth::AuthorizationRequest;pub use auth::IntrospectionResponse;pub use auth::JwtConfig;pub use auth::OAuthClient;pub use auth::OAuthClientInfo;pub use auth::OAuthConfig;pub use auth::OAuthGrantType;pub use auth::RefreshToken;pub use auth::ScopedApiKey;pub use auth::Session;pub use auth::SessionContext;pub use auth::SessionStore;pub use auth::TokenClaims;pub use auth::TokenRequest;pub use auth::TokenResponse;pub use auth::UserInfo;pub use analytics::create_default_rules;pub use analytics::global_dashboard;pub use analytics::init_global_dashboard;pub use analytics::record_global;pub use analytics::time_global;pub use analytics::ActiveAlert;pub use analytics::AlertHandler;pub use analytics::AlertManager;pub use analytics::AlertMetric;pub use analytics::AlertRule;pub use analytics::AlertSeverity;pub use analytics::Dashboard;pub use analytics::DashboardConfig;pub use analytics::DashboardSnapshot;pub use analytics::LoggingAlertHandler;pub use analytics::Metric;pub use analytics::MetricStats;pub use analytics::MetricType as AnalyticsMetricType;pub use analytics::MetricValue;pub use analytics::MetricsCollector;pub use analytics::OperationCategory;pub use analytics::OperationRecord;pub use analytics::RateCalculator;pub use analytics::ResourceSnapshot;pub use analytics::ScopedTimer;pub use analytics::ThresholdOperator;pub use analytics::TimeResolution;
Modules§
- analytics
- Real-Time Analytics Dashboard Module
- arrow_
integration - Apache Arrow Integration
- audit
- Audit logging module for DataFrame operations
- auth
- Enterprise Authentication Module
- column
- compute
- config
- Configuration management for PandRS
- connectors
- Data Connectors
- core
- dataframe
- distributed
- Distributed Processing Module
- error
- graph
- Graph analytics module for PandRS
- groupby
- index
- io
- jupyter
- Jupyter Notebook Integration for PandRS
- large
- Module for handling large datasets
- ml
- Machine Learning Module
- multitenancy
- Multi-Tenancy Support Module
- na
- optimized
- parallel
- Module providing parallel processing functionality
- pivot
- Module providing pivot table functionality
- series
- stats
- PandRS Statistics Module
- storage
- streaming
- Module for streaming data processing
- temporal
- Module for time series data manipulation
- time_
series - Time Series Analysis and Forecasting Module
- versioning
- Data versioning and lineage tracking module
- vis
- Module providing data visualization functionality
Macros§
- agg_
spec - Create aggregation specification (similar to pandas)
- column_
aggs - Create multiple named aggregations for a column
- iloc
- Macro for convenient indexing
- loc
- named_
agg - Helper macros for creating aggregation specifications Create a named aggregation
- select