Expand description
§PandRS
A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.
§Overview
PandRS brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:
- Type-safe operations leveraging Rust’s ownership system
- High-performance computing through SIMD vectorization and parallel processing
- Memory-efficient design with columnar storage and string pooling
- Comprehensive functionality matching pandas’ core features
- Seamless interoperability with Python, Arrow, and various data formats
§Quick Start
use pandrs::{DataFrame, Series};
// Create a DataFrame
let mut df = DataFrame::new();
df.add_column("name".to_string(),
Series::new(vec!["Alice", "Bob", "Carol"], Some("name".to_string())).expect("operation should succeed")).expect("operation should succeed");
df.add_column("age".to_string(),
Series::new(vec![30i64, 25, 35], Some("age".to_string())).expect("operation should succeed")).expect("operation should succeed");
// Basic operations
let nrows = df.row_count();
let ncols = df.column_count();§Feature Flags
PandRS supports various feature flags for optional functionality:
-
Core features:
stable: Recommended stable feature setoptimized: Performance optimizations and SIMDbackward_compat: Backward compatibility support
-
Data formats:
parquet: Apache Parquet file supportexcel: Excel file support (read/write)sql: Database connectivity (PostgreSQL, MySQL, SQLite)
-
Advanced features:
distributed: Distributed computing with DataFusionvisualization: Plotting capabilitiesstreaming: Real-time data processingserving: Model serving and deployment
-
Experimental:
cuda: GPU acceleration (requires CUDA toolkit)wasm: WebAssembly compilation supportjit: Just-in-time compilation
§Core Data Structures
Series: One-dimensional labeled array capable of holding any data typeDataFrame: Two-dimensional, size-mutable, heterogeneous tabular data structureMultiIndex: Hierarchical indexing for advanced data organizationCategorical: Memory-efficient representation for string data with limited cardinality
§Modules
dataframe: DataFrame operations and manipulationseries: Series operations and manipulationstats: Statistical functions and analysisml: Machine learning algorithms and utilitiesio: Input/output operations for various file formatsstreaming: Real-time streaming data processingtime_series: Time series analysis and forecastinggraph: Graph analytics and algorithms
§Version
Current version: 0.1.0
Re-exports§
pub use scirs2_integration::dataframe_ext::SciRS2Ext;pub use core::column::BitMask as CoreBitMask;pub use core::column::Column as CoreColumn;pub use core::column::ColumnCast;pub use core::column::ColumnTrait;pub use core::column::ColumnType as CoreColumnType;pub use core::data_value::DataValue;pub use core::data_value::DataValueExt;pub use core::data_value::DisplayExt;pub use core::error::Error;pub use core::error::Result;pub use core::index::Index as CoreIndex;pub use core::index::IndexTrait;pub use core::multi_index::MultiIndex as CoreMultiIndex;pub use config::credentials::CredentialBuilder;pub use config::credentials::CredentialMetadata;pub use config::credentials::CredentialStore;pub use config::credentials::CredentialStoreConfig;pub use config::credentials::CredentialType;pub use config::credentials::EncryptedCredential;pub use config::AccessControlConfig;pub use config::AuditConfig;pub use config::AwsConfig;pub use config::AzureConfig;pub use config::CachingConfig;pub use config::CloudConfig;pub use config::ConnectionPoolConfig;pub use config::DatabaseConfig;pub use config::EncryptionConfig;pub use config::GcpConfig;pub use config::GlobalCloudConfig;pub use config::JitConfig;pub use config::LogRotationConfig;pub use config::LoggingConfig;pub use config::MemoryConfig;pub use config::PandRSConfig;pub use config::PerformanceConfig;pub use config::SecurityConfig;pub use config::SslConfig;pub use config::ThreadingConfig;pub use config::TimeoutConfig;pub use column::BooleanColumn;pub use column::Column;pub use column::ColumnType;pub use column::Float64Column;pub use column::Int64Column;pub use column::StringColumn;pub use dataframe::DataFrame;pub use dataframe::MeltOptions;pub use dataframe::StackOptions;pub use dataframe::UnstackOptions;pub use error::PandRSError;pub use groupby::GroupBy;pub use index::DataFrameIndex;pub use index::Index;pub use index::IndexTrait as LegacyIndexTrait;pub use index::MultiIndex;pub use index::RangeIndex;pub use index::StringIndex;pub use index::StringMultiIndex;pub use na::NA;pub use optimized::AggregateOp;pub use optimized::JoinType;pub use optimized::LazyFrame;pub use optimized::OptimizedDataFrame;pub use parallel::ParallelUtils;pub use series::Categorical;pub use series::CategoricalOrder;pub use series::NASeries;pub use series::Series;pub use series::StringCategorical;pub use stats::DescriptiveStats;pub use stats::LinearRegressionResult;pub use stats::TTestResult;pub use vis::OutputFormat;pub use vis::PlotConfig;pub use vis::PlotType;pub use vis::svg::BarChart as SvgBarChart;pub use vis::svg::BarOrientation as SvgBarOrientation;pub use vis::svg::Color as SvgColor;pub use vis::svg::ColorScheme as SvgColorScheme;pub use vis::svg::DrawStyle;pub use vis::svg::HeatMap as SvgHeatMap;pub use vis::svg::LegendPosition;pub use vis::svg::LineChart as SvgLineChart;pub use vis::svg::LineSeries;pub use vis::svg::Margins as SvgMargins;pub use vis::svg::MarkerShape;pub use vis::svg::PathBuilder;pub use vis::svg::PieChart as SvgPieChart;pub use vis::svg::ScatterPlot as SvgScatterPlot;pub use vis::svg::SvgCanvas;pub use vis::svg::SvgChartConfig;pub use vis::svg::SvgHistogram;pub use vis::svg::SvgPlotType;pub use vis::svg::SvgVisualize;pub use vis::svg::Transform as SvgTransform;pub use jupyter::get_jupyter_config;pub use jupyter::init_jupyter;pub use jupyter::jupyter_dark_mode;pub use jupyter::jupyter_light_mode;pub use jupyter::set_jupyter_config;pub use jupyter::JupyterColorScheme;pub use jupyter::JupyterConfig;pub use jupyter::JupyterDisplay;pub use jupyter::JupyterMagics;pub use jupyter::TableStyle;pub use jupyter::TableWidth;pub use ml::anomaly::IsolationForest;pub use ml::anomaly::LocalOutlierFactor;pub use ml::anomaly::OneClassSVM;pub use ml::clustering::AgglomerativeClustering;pub use ml::clustering::DistanceMetric;pub use ml::clustering::KMeans;pub use ml::clustering::Linkage;pub use ml::clustering::DBSCAN;pub use ml::dimension::TSNEInit;pub use ml::dimension::PCA;pub use ml::dimension::TSNE;pub use ml::metrics::classification::accuracy_score;pub use ml::metrics::classification::f1_score;pub use ml::metrics::classification::precision_score;pub use ml::metrics::classification::recall_score;pub use ml::metrics::regression::explained_variance_score;pub use ml::metrics::regression::mean_absolute_error;pub use ml::metrics::regression::mean_squared_error;pub use ml::metrics::regression::r2_score;pub use ml::metrics::regression::root_mean_squared_error;pub use ml::models::ensemble::GradientBoostingClassifier;pub use ml::models::ensemble::GradientBoostingConfig;pub use ml::models::ensemble::GradientBoostingRegressor;pub use ml::models::ensemble::RandomForestClassifier;pub use ml::models::ensemble::RandomForestConfig;pub use ml::models::ensemble::RandomForestRegressor;pub use ml::models::linear::LinearRegression;pub use ml::models::linear::LogisticRegression;pub use ml::models::neural::Activation;pub use ml::models::neural::LossFunction;pub use ml::models::neural::MLPClassifier;pub use ml::models::neural::MLPConfig;pub use ml::models::neural::MLPConfigBuilder;pub use ml::models::neural::MLPRegressor;pub use ml::models::tree::DecisionTreeClassifier;pub use ml::models::tree::DecisionTreeConfig;pub use ml::models::tree::DecisionTreeRegressor;pub use ml::models::tree::SplitCriterion;pub use ml::models::train_test_split;pub use ml::models::CrossValidation;pub use ml::models::ModelEvaluator;pub use ml::models::ModelMetrics;pub use ml::models::SupervisedModel;pub use ml::models::UnsupervisedModel;pub use ml::pipeline::Pipeline;pub use ml::pipeline::PipelineStage;pub use ml::pipeline::PipelineTransformer;pub use ml::preprocessing::Binner;pub use ml::preprocessing::FeatureSelector;pub use ml::preprocessing::ImputeStrategy;pub use ml::preprocessing::Imputer;pub use ml::preprocessing::MinMaxScaler;pub use ml::preprocessing::OneHotEncoder;pub use ml::preprocessing::PolynomialFeatures;pub use ml::preprocessing::StandardScaler;pub use large::external_sort;pub use large::merge_sorted_chunks;pub use large::hash_join_out_of_core;pub use large::OutOfCoreJoinType;pub use large::AggOp as OutOfCoreAggOp;pub use large::OutOfCoreConfig;pub use large::OutOfCoreReader;pub use large::OutOfCoreWriter;pub use large::ChunkedDataFrame;pub use large::DiskBasedDataFrame;pub use large::DiskBasedOptimizedDataFrame;pub use large::DiskConfig;pub use streaming::AggregationType;pub use streaming::BackpressureBuffer;pub use streaming::BackpressureChannel;pub use streaming::BackpressureConfig;pub use streaming::BackpressureConfigBuilder;pub use streaming::BackpressureStats;pub use streaming::BackpressureStrategy;pub use streaming::DataStream;pub use streaming::FlowController;pub use streaming::MetricType;pub use streaming::MultiColumnAggregator;pub use streaming::RealTimeAnalytics;pub use streaming::StreamAggregator;pub use streaming::StreamConfig;pub use streaming::StreamConnector;pub use streaming::StreamProcessor;pub use streaming::StreamRecord;pub use streaming::TimeWindow;pub use streaming::WindowAggregation;pub use streaming::WindowConfig;pub use streaming::WindowConfigBuilder;pub use streaming::WindowResult;pub use streaming::WindowType;pub use streaming::WindowedAggregator;pub use time_series::ArimaForecaster;pub use time_series::AugmentedDickeyFullerTest;pub use time_series::AutoArima;pub use time_series::AutocorrelationAnalysis;pub use time_series::ChangePointDetection;pub use time_series::DateTimeIndex;pub use time_series::DecompositionMethod;pub use time_series::DecompositionResult;pub use time_series::Differencing;pub use time_series::ExponentialSmoothingForecaster;pub use time_series::FeatureSet;pub use time_series::ForecastMetrics;pub use time_series::ForecastResult;pub use time_series::Forecaster;pub use time_series::Frequency;pub use time_series::KwiatkowskiPhillipsSchmidtShinTest;pub use time_series::LinearTrendForecaster;pub use time_series::MissingValueStrategy;pub use time_series::ModelSelectionCriterion;pub use time_series::ModelSelectionResult;pub use time_series::Normalization;pub use time_series::OutlierDetection;pub use time_series::SarimaForecaster;pub use time_series::SeasonalDecomposition;pub use time_series::SeasonalTest;pub use time_series::SeasonalityAnalysis;pub use time_series::SimpleMovingAverageForecaster;pub use time_series::StationarityTest;pub use time_series::StatisticalFeatures;pub use time_series::TimePoint;pub use time_series::TimeSeries;pub use time_series::TimeSeriesBuilder;pub use time_series::TimeSeriesFeatureExtractor;pub use time_series::TimeSeriesPreprocessor;pub use time_series::TimeSeriesStats;pub use time_series::TrendAnalysis;pub use time_series::WhiteNoiseTest;pub use time_series::WindowFeatures;pub use compute::lazy::LazyFrame as ComputeLazyFrame;pub use compute::parallel::ParallelUtils as ComputeParallelUtils;pub use storage::column_store::ColumnStore;pub use storage::disk::DiskStorage;pub use storage::memory_mapped::MemoryMappedFile;pub use storage::string_pool::StringPool as StorageStringPool;pub use distributed::core::DistributedConfig;pub use distributed::core::DistributedDataFrame;pub use distributed::core::ToDistributed;pub use distributed::execution::ExecutionContext;pub use distributed::execution::ExecutionEngine;pub use distributed::execution::ExecutionPlan;pub use graph::bellman_ford_default;pub use graph::betweenness_centrality;pub use graph::bfs;pub use graph::closeness_centrality;pub use graph::connected_components;pub use graph::degree_centrality;pub use graph::dfs;pub use graph::dijkstra;pub use graph::dijkstra_default;pub use graph::eigenvector_centrality_default;pub use graph::floyd_warshall_default;pub use graph::from_adjacency_matrix;pub use graph::from_edge_dataframe;pub use graph::has_cycle;pub use graph::hits_default;pub use graph::is_connected;pub use graph::label_propagation;pub use graph::louvain_default;pub use graph::modularity;pub use graph::pagerank;pub use graph::pagerank_default;pub use graph::shortest_path_bfs;pub use graph::strongly_connected_components;pub use graph::to_adjacency_matrix;pub use graph::to_edge_dataframe;pub use graph::topological_sort;pub use graph::AllPairsShortestPaths;pub use graph::BfsResult;pub use graph::ComponentResult;pub use graph::DfsResult;pub use graph::Edge;pub use graph::EdgeId;pub use graph::Graph;pub use graph::GraphBuilder;pub use graph::GraphError;pub use graph::GraphType;pub use graph::Node;pub use graph::NodeId;pub use graph::ShortestPathResult;pub use versioning::DataFrameVersioning;pub use versioning::DataSchema;pub use versioning::DataVersion;pub use versioning::LineageConfig;pub use versioning::LineageTracker;pub use versioning::Operation;pub use versioning::OperationType;pub use versioning::TrackerStats;pub use versioning::VersionDiff;pub use versioning::VersionId;pub use versioning::VersionedTransform;pub use versioning::VersioningError;pub use schema_evolution::BreakingChange;pub use schema_evolution::ColumnSchema;pub use schema_evolution::CompatibilityReport;pub use schema_evolution::DataFrameSchema;pub use schema_evolution::DefaultValue;pub use schema_evolution::Migration;pub use schema_evolution::MigrationBuilder;pub use schema_evolution::SchemaChange;pub use schema_evolution::SchemaConstraint;pub use schema_evolution::SchemaDataType;pub use schema_evolution::SchemaFormat;pub use schema_evolution::SchemaMigrator;pub use schema_evolution::SchemaRegistry;pub use schema_evolution::SchemaVersion;pub use schema_evolution::ValidationError;pub use schema_evolution::ValidationErrorType;pub use schema_evolution::ValidationReport;pub use audit::global_logger;pub use audit::init_global_logger;pub use audit::log_global;pub use audit::AuditConfig as AuditLogConfig;pub use audit::AuditConfigBuilder as AuditLogConfigBuilder;pub use audit::AuditEntry;pub use audit::AuditLogger;pub use audit::AuditStats;pub use audit::EventCategory;pub use audit::LogContext;pub use audit::LogDestination;pub use audit::LogLevel;pub use multitenancy::DatasetId;pub use multitenancy::DatasetMetadata;pub use multitenancy::IsolationContext;pub use multitenancy::Permission;pub use multitenancy::ResourceQuota;pub use multitenancy::TenantAuditEntry;pub use multitenancy::TenantConfig;pub use multitenancy::TenantId;pub use multitenancy::TenantManager;pub use multitenancy::TenantOperation;pub use multitenancy::TenantUsage;pub use auth::decode_jwt;pub use auth::encode_jwt;pub use auth::get_token_expiration;pub use auth::is_token_expired;pub use auth::verify_jwt;pub use auth::ApiKeyInfo;pub use auth::ApiKeyManager;pub use auth::ApiKeyStats;pub use auth::AuthEvent;pub use auth::AuthEventType;pub use auth::AuthManager;pub use auth::AuthMethod;pub use auth::AuthResult;pub use auth::AuthorizationRequest;pub use auth::IntrospectionResponse;pub use auth::JwtConfig;pub use auth::OAuthClient;pub use auth::OAuthClientInfo;pub use auth::OAuthConfig;pub use auth::OAuthGrantType;pub use auth::RefreshToken;pub use auth::ScopedApiKey;pub use auth::Session;pub use auth::SessionContext;pub use auth::SessionStore;pub use auth::TokenClaims;pub use auth::TokenRequest;pub use auth::TokenResponse;pub use auth::UserInfo;pub use analytics::create_default_rules;pub use analytics::global_dashboard;pub use analytics::init_global_dashboard;pub use analytics::record_global;pub use analytics::time_global;pub use analytics::ActiveAlert;pub use analytics::AlertHandler;pub use analytics::AlertManager;pub use analytics::AlertMetric;pub use analytics::AlertRule;pub use analytics::AlertSeverity;pub use analytics::Dashboard;pub use analytics::DashboardConfig;pub use analytics::DashboardSnapshot;pub use analytics::LoggingAlertHandler;pub use analytics::Metric;pub use analytics::MetricStats;pub use analytics::MetricType as AnalyticsMetricType;pub use analytics::MetricValue;pub use analytics::MetricsCollector;pub use analytics::OperationCategory;pub use analytics::OperationRecord;pub use analytics::RateCalculator;pub use analytics::ResourceSnapshot;pub use analytics::ScopedTimer;pub use analytics::ThresholdOperator;pub use analytics::TimeResolution;
Modules§
- analytics
- Real-time analytics dashboard module.
- arrow_
integration - Arrow integration module for ecosystem compatibility.
- audit
- Audit logging module.
- auth
- Enterprise authentication module (JWT, OAuth, API Keys).
- column
- Column implementations and column-level operations.
- compute
- Compute module for computation functionality.
- config
- Configuration management for secure settings and credentials.
- connectors
- Data connectors for databases and cloud storage.
- core
- Core module with fundamental data structures and traits.
- dataframe
- DataFrame data structure and operations.
- distributed
- Distributed Processing Module
- error
- Error types and error handling utilities.
- graph
- Graph analytics module.
- groupby
- GroupBy operations for split-apply-combine workflows.
- index
- Index types for DataFrame and Series labeling.
- io
- Input/output operations for reading and writing data.
- jupyter
- Jupyter notebook integration and display formatting.
- large
- Large dataset processing with chunking and disk-based operations.
- ml
- Machine learning algorithms and utilities.
- multitenancy
- Multi-tenancy support module.
- na
- NA (Not Available) value handling and missing data operations.
- optimized
- Optimized implementations using SIMD and vectorization.
- parallel
- Parallel processing utilities and thread pool management.
- pivot
- Pivot table operations for data reshaping.
- plugins
- Plugin system for custom data sources, transforms, and sinks.
- schema_
evolution - Schema evolution and migration tools.
- scirs2_
integration - SciRS2 integration module for scientific computing capabilities.
- series
- Series data structure for one-dimensional labeled data.
- stats
- Statistical functions and analysis tools.
- storage
- Storage module for data storage engines.
- streaming
- Real-time streaming data processing.
- temporal
- Temporal operations for time-based data.
- time_
series - Time series analysis and forecasting.
- versioning
- Data versioning and lineage tracking module.
- vis
- Visualization and plotting utilities.
Macros§
- agg_
spec - Create aggregation specification (similar to pandas)
- column_
aggs - Create multiple named aggregations for a column
- iloc
- Macro for convenient indexing
- loc
- lock_
and_ clone - Lock a Mutex and clone the contained value
- lock_
safe - Safely acquire a Mutex lock, converting poison errors to Result
- named_
agg - Helper macros for creating aggregation specifications Create a named aggregation
- read_
lock_ safe - Safely acquire a RwLock read lock
- select
- write_
lock_ safe - Safely acquire a RwLock write lock
Constants§
- VERSION
- The current version of the PandRS library.