Skip to main content

Crate pandrs

Crate pandrs 

Source
Expand description

§PandRS

A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.

§Overview

PandRS brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:

  • Type-safe operations leveraging Rust’s ownership system
  • High-performance computing through SIMD vectorization and parallel processing
  • Memory-efficient design with columnar storage and string pooling
  • Comprehensive functionality matching pandas’ core features
  • Seamless interoperability with Python, Arrow, and various data formats

§Quick Start

use pandrs::{DataFrame, Series};

// Create a DataFrame
let mut df = DataFrame::new();
df.add_column("name".to_string(),
    Series::new(vec!["Alice", "Bob", "Carol"], Some("name".to_string())).expect("operation should succeed")).expect("operation should succeed");
df.add_column("age".to_string(),
    Series::new(vec![30i64, 25, 35], Some("age".to_string())).expect("operation should succeed")).expect("operation should succeed");

// Basic operations
let nrows = df.row_count();
let ncols = df.column_count();

§Feature Flags

PandRS supports various feature flags for optional functionality:

  • Core features:

    • stable: Recommended stable feature set
    • optimized: Performance optimizations and SIMD
    • backward_compat: Backward compatibility support
  • Data formats:

    • parquet: Apache Parquet file support
    • excel: Excel file support (read/write)
    • sql: Database connectivity (PostgreSQL, MySQL, SQLite)
  • Advanced features:

    • distributed: Distributed computing with DataFusion
    • visualization: Plotting capabilities
    • streaming: Real-time data processing
    • serving: Model serving and deployment
  • Experimental:

    • cuda: GPU acceleration (requires CUDA toolkit)
    • wasm: WebAssembly compilation support
    • jit: Just-in-time compilation

§Core Data Structures

  • Series: One-dimensional labeled array capable of holding any data type
  • DataFrame: Two-dimensional, size-mutable, heterogeneous tabular data structure
  • MultiIndex: Hierarchical indexing for advanced data organization
  • Categorical: Memory-efficient representation for string data with limited cardinality

§Modules

  • dataframe: DataFrame operations and manipulation
  • series: Series operations and manipulation
  • stats: Statistical functions and analysis
  • ml: Machine learning algorithms and utilities
  • io: Input/output operations for various file formats
  • streaming: Real-time streaming data processing
  • time_series: Time series analysis and forecasting
  • graph: Graph analytics and algorithms

§Version

Current version: 0.1.0

Re-exports§

pub use scirs2_integration::dataframe_ext::SciRS2Ext;
pub use core::column::BitMask as CoreBitMask;
pub use core::column::Column as CoreColumn;
pub use core::column::ColumnCast;
pub use core::column::ColumnTrait;
pub use core::column::ColumnType as CoreColumnType;
pub use core::data_value::DataValue;
pub use core::data_value::DataValueExt;
pub use core::data_value::DisplayExt;
pub use core::error::Error;
pub use core::error::Result;
pub use core::index::Index as CoreIndex;
pub use core::index::IndexTrait;
pub use core::multi_index::MultiIndex as CoreMultiIndex;
pub use config::credentials::CredentialBuilder;
pub use config::credentials::CredentialMetadata;
pub use config::credentials::CredentialStore;
pub use config::credentials::CredentialStoreConfig;
pub use config::credentials::CredentialType;
pub use config::credentials::EncryptedCredential;
pub use config::AccessControlConfig;
pub use config::AuditConfig;
pub use config::AwsConfig;
pub use config::AzureConfig;
pub use config::CachingConfig;
pub use config::CloudConfig;
pub use config::ConnectionPoolConfig;
pub use config::DatabaseConfig;
pub use config::EncryptionConfig;
pub use config::GcpConfig;
pub use config::GlobalCloudConfig;
pub use config::JitConfig;
pub use config::LogRotationConfig;
pub use config::LoggingConfig;
pub use config::MemoryConfig;
pub use config::PandRSConfig;
pub use config::PerformanceConfig;
pub use config::SecurityConfig;
pub use config::SslConfig;
pub use config::ThreadingConfig;
pub use config::TimeoutConfig;
pub use column::BooleanColumn;
pub use column::Column;
pub use column::ColumnType;
pub use column::Float64Column;
pub use column::Int64Column;
pub use column::StringColumn;
pub use dataframe::DataFrame;
pub use dataframe::MeltOptions;
pub use dataframe::StackOptions;
pub use dataframe::UnstackOptions;
pub use error::PandRSError;
pub use groupby::GroupBy;
pub use index::DataFrameIndex;
pub use index::Index;
pub use index::IndexTrait as LegacyIndexTrait;
pub use index::MultiIndex;
pub use index::RangeIndex;
pub use index::StringIndex;
pub use index::StringMultiIndex;
pub use na::NA;
pub use optimized::AggregateOp;
pub use optimized::JoinType;
pub use optimized::LazyFrame;
pub use optimized::OptimizedDataFrame;
pub use parallel::ParallelUtils;
pub use series::Categorical;
pub use series::CategoricalOrder;
pub use series::NASeries;
pub use series::Series;
pub use series::StringCategorical;
pub use stats::DescriptiveStats;
pub use stats::LinearRegressionResult;
pub use stats::TTestResult;
pub use vis::OutputFormat;
pub use vis::PlotConfig;
pub use vis::PlotType;
pub use vis::svg::BarChart as SvgBarChart;
pub use vis::svg::BarOrientation as SvgBarOrientation;
pub use vis::svg::Color as SvgColor;
pub use vis::svg::ColorScheme as SvgColorScheme;
pub use vis::svg::DrawStyle;
pub use vis::svg::HeatMap as SvgHeatMap;
pub use vis::svg::LegendPosition;
pub use vis::svg::LineChart as SvgLineChart;
pub use vis::svg::LineSeries;
pub use vis::svg::Margins as SvgMargins;
pub use vis::svg::MarkerShape;
pub use vis::svg::PathBuilder;
pub use vis::svg::PieChart as SvgPieChart;
pub use vis::svg::ScatterPlot as SvgScatterPlot;
pub use vis::svg::SvgCanvas;
pub use vis::svg::SvgChartConfig;
pub use vis::svg::SvgHistogram;
pub use vis::svg::SvgPlotType;
pub use vis::svg::SvgVisualize;
pub use vis::svg::Transform as SvgTransform;
pub use jupyter::get_jupyter_config;
pub use jupyter::init_jupyter;
pub use jupyter::jupyter_dark_mode;
pub use jupyter::jupyter_light_mode;
pub use jupyter::set_jupyter_config;
pub use jupyter::JupyterColorScheme;
pub use jupyter::JupyterConfig;
pub use jupyter::JupyterDisplay;
pub use jupyter::JupyterMagics;
pub use jupyter::TableStyle;
pub use jupyter::TableWidth;
pub use ml::anomaly::IsolationForest;
pub use ml::anomaly::LocalOutlierFactor;
pub use ml::anomaly::OneClassSVM;
pub use ml::clustering::AgglomerativeClustering;
pub use ml::clustering::DistanceMetric;
pub use ml::clustering::KMeans;
pub use ml::clustering::Linkage;
pub use ml::clustering::DBSCAN;
pub use ml::dimension::TSNEInit;
pub use ml::dimension::PCA;
pub use ml::dimension::TSNE;
pub use ml::metrics::classification::accuracy_score;
pub use ml::metrics::classification::f1_score;
pub use ml::metrics::classification::precision_score;
pub use ml::metrics::classification::recall_score;
pub use ml::metrics::regression::explained_variance_score;
pub use ml::metrics::regression::mean_absolute_error;
pub use ml::metrics::regression::mean_squared_error;
pub use ml::metrics::regression::r2_score;
pub use ml::metrics::regression::root_mean_squared_error;
pub use ml::models::ensemble::GradientBoostingClassifier;
pub use ml::models::ensemble::GradientBoostingConfig;
pub use ml::models::ensemble::GradientBoostingRegressor;
pub use ml::models::ensemble::RandomForestClassifier;
pub use ml::models::ensemble::RandomForestConfig;
pub use ml::models::ensemble::RandomForestRegressor;
pub use ml::models::linear::LinearRegression;
pub use ml::models::linear::LogisticRegression;
pub use ml::models::neural::Activation;
pub use ml::models::neural::LossFunction;
pub use ml::models::neural::MLPClassifier;
pub use ml::models::neural::MLPConfig;
pub use ml::models::neural::MLPConfigBuilder;
pub use ml::models::neural::MLPRegressor;
pub use ml::models::tree::DecisionTreeClassifier;
pub use ml::models::tree::DecisionTreeConfig;
pub use ml::models::tree::DecisionTreeRegressor;
pub use ml::models::tree::SplitCriterion;
pub use ml::models::train_test_split;
pub use ml::models::CrossValidation;
pub use ml::models::ModelEvaluator;
pub use ml::models::ModelMetrics;
pub use ml::models::SupervisedModel;
pub use ml::models::UnsupervisedModel;
pub use ml::pipeline::Pipeline;
pub use ml::pipeline::PipelineStage;
pub use ml::pipeline::PipelineTransformer;
pub use ml::preprocessing::Binner;
pub use ml::preprocessing::FeatureSelector;
pub use ml::preprocessing::ImputeStrategy;
pub use ml::preprocessing::Imputer;
pub use ml::preprocessing::MinMaxScaler;
pub use ml::preprocessing::OneHotEncoder;
pub use ml::preprocessing::PolynomialFeatures;
pub use ml::preprocessing::StandardScaler;
pub use large::external_sort;
pub use large::merge_sorted_chunks;
pub use large::hash_join_out_of_core;
pub use large::OutOfCoreJoinType;
pub use large::AggOp as OutOfCoreAggOp;
pub use large::OutOfCoreConfig;
pub use large::OutOfCoreReader;
pub use large::OutOfCoreWriter;
pub use large::ChunkedDataFrame;
pub use large::DiskBasedDataFrame;
pub use large::DiskBasedOptimizedDataFrame;
pub use large::DiskConfig;
pub use streaming::AggregationType;
pub use streaming::BackpressureBuffer;
pub use streaming::BackpressureChannel;
pub use streaming::BackpressureConfig;
pub use streaming::BackpressureConfigBuilder;
pub use streaming::BackpressureStats;
pub use streaming::BackpressureStrategy;
pub use streaming::DataStream;
pub use streaming::FlowController;
pub use streaming::MetricType;
pub use streaming::MultiColumnAggregator;
pub use streaming::RealTimeAnalytics;
pub use streaming::StreamAggregator;
pub use streaming::StreamConfig;
pub use streaming::StreamConnector;
pub use streaming::StreamProcessor;
pub use streaming::StreamRecord;
pub use streaming::TimeWindow;
pub use streaming::WindowAggregation;
pub use streaming::WindowConfig;
pub use streaming::WindowConfigBuilder;
pub use streaming::WindowResult;
pub use streaming::WindowType;
pub use streaming::WindowedAggregator;
pub use time_series::ArimaForecaster;
pub use time_series::AugmentedDickeyFullerTest;
pub use time_series::AutoArima;
pub use time_series::AutocorrelationAnalysis;
pub use time_series::ChangePointDetection;
pub use time_series::DateTimeIndex;
pub use time_series::DecompositionMethod;
pub use time_series::DecompositionResult;
pub use time_series::Differencing;
pub use time_series::ExponentialSmoothingForecaster;
pub use time_series::FeatureSet;
pub use time_series::ForecastMetrics;
pub use time_series::ForecastResult;
pub use time_series::Forecaster;
pub use time_series::Frequency;
pub use time_series::KwiatkowskiPhillipsSchmidtShinTest;
pub use time_series::LinearTrendForecaster;
pub use time_series::MissingValueStrategy;
pub use time_series::ModelSelectionCriterion;
pub use time_series::ModelSelectionResult;
pub use time_series::Normalization;
pub use time_series::OutlierDetection;
pub use time_series::SarimaForecaster;
pub use time_series::SeasonalDecomposition;
pub use time_series::SeasonalTest;
pub use time_series::SeasonalityAnalysis;
pub use time_series::SimpleMovingAverageForecaster;
pub use time_series::StationarityTest;
pub use time_series::StatisticalFeatures;
pub use time_series::TimePoint;
pub use time_series::TimeSeries;
pub use time_series::TimeSeriesBuilder;
pub use time_series::TimeSeriesFeatureExtractor;
pub use time_series::TimeSeriesPreprocessor;
pub use time_series::TimeSeriesStats;
pub use time_series::TrendAnalysis;
pub use time_series::WhiteNoiseTest;
pub use time_series::WindowFeatures;
pub use compute::lazy::LazyFrame as ComputeLazyFrame;
pub use compute::parallel::ParallelUtils as ComputeParallelUtils;
pub use storage::column_store::ColumnStore;
pub use storage::disk::DiskStorage;
pub use storage::memory_mapped::MemoryMappedFile;
pub use storage::string_pool::StringPool as StorageStringPool;
pub use distributed::core::DistributedConfig;
pub use distributed::core::DistributedDataFrame;
pub use distributed::core::ToDistributed;
pub use distributed::execution::ExecutionContext;
pub use distributed::execution::ExecutionEngine;
pub use distributed::execution::ExecutionPlan;
pub use graph::bellman_ford_default;
pub use graph::betweenness_centrality;
pub use graph::bfs;
pub use graph::closeness_centrality;
pub use graph::connected_components;
pub use graph::degree_centrality;
pub use graph::dfs;
pub use graph::dijkstra;
pub use graph::dijkstra_default;
pub use graph::eigenvector_centrality_default;
pub use graph::floyd_warshall_default;
pub use graph::from_adjacency_matrix;
pub use graph::from_edge_dataframe;
pub use graph::has_cycle;
pub use graph::hits_default;
pub use graph::is_connected;
pub use graph::label_propagation;
pub use graph::louvain_default;
pub use graph::modularity;
pub use graph::pagerank;
pub use graph::pagerank_default;
pub use graph::shortest_path_bfs;
pub use graph::strongly_connected_components;
pub use graph::to_adjacency_matrix;
pub use graph::to_edge_dataframe;
pub use graph::topological_sort;
pub use graph::AllPairsShortestPaths;
pub use graph::BfsResult;
pub use graph::ComponentResult;
pub use graph::DfsResult;
pub use graph::Edge;
pub use graph::EdgeId;
pub use graph::Graph;
pub use graph::GraphBuilder;
pub use graph::GraphError;
pub use graph::GraphType;
pub use graph::Node;
pub use graph::NodeId;
pub use graph::ShortestPathResult;
pub use versioning::DataFrameVersioning;
pub use versioning::DataSchema;
pub use versioning::DataVersion;
pub use versioning::LineageConfig;
pub use versioning::LineageTracker;
pub use versioning::Operation;
pub use versioning::OperationType;
pub use versioning::SharedLineageTracker;
pub use versioning::TrackerStats;
pub use versioning::VersionDiff;
pub use versioning::VersionId;
pub use versioning::VersionedTransform;
pub use versioning::VersioningError;
pub use schema_evolution::BreakingChange;
pub use schema_evolution::ColumnSchema;
pub use schema_evolution::CompatibilityReport;
pub use schema_evolution::DataFrameSchema;
pub use schema_evolution::DefaultValue;
pub use schema_evolution::Migration;
pub use schema_evolution::MigrationBuilder;
pub use schema_evolution::SchemaChange;
pub use schema_evolution::SchemaConstraint;
pub use schema_evolution::SchemaDataType;
pub use schema_evolution::SchemaFormat;
pub use schema_evolution::SchemaMigrator;
pub use schema_evolution::SchemaRegistry;
pub use schema_evolution::SchemaVersion;
pub use schema_evolution::ValidationError;
pub use schema_evolution::ValidationErrorType;
pub use schema_evolution::ValidationReport;
pub use audit::global_logger;
pub use audit::init_global_logger;
pub use audit::log_global;
pub use audit::AuditConfig as AuditLogConfig;
pub use audit::AuditConfigBuilder as AuditLogConfigBuilder;
pub use audit::AuditEntry;
pub use audit::AuditLogger;
pub use audit::AuditStats;
pub use audit::EventCategory;
pub use audit::LogContext;
pub use audit::LogDestination;
pub use audit::LogLevel;
pub use audit::SharedAuditLogger;
pub use multitenancy::create_shared_manager;
pub use multitenancy::DatasetId;
pub use multitenancy::DatasetMetadata;
pub use multitenancy::IsolationContext;
pub use multitenancy::Permission;
pub use multitenancy::ResourceQuota;
pub use multitenancy::SharedTenantManager;
pub use multitenancy::TenantAuditEntry;
pub use multitenancy::TenantConfig;
pub use multitenancy::TenantId;
pub use multitenancy::TenantManager;
pub use multitenancy::TenantOperation;
pub use multitenancy::TenantUsage;
pub use auth::create_shared_auth_manager;
pub use auth::decode_jwt;
pub use auth::encode_jwt;
pub use auth::get_token_expiration;
pub use auth::is_token_expired;
pub use auth::verify_jwt;
pub use auth::ApiKeyInfo;
pub use auth::ApiKeyManager;
pub use auth::ApiKeyStats;
pub use auth::AuthEvent;
pub use auth::AuthEventType;
pub use auth::AuthManager;
pub use auth::AuthMethod;
pub use auth::AuthResult;
pub use auth::AuthorizationRequest;
pub use auth::IntrospectionResponse;
pub use auth::JwtConfig;
pub use auth::OAuthClient;
pub use auth::OAuthClientInfo;
pub use auth::OAuthConfig;
pub use auth::OAuthGrantType;
pub use auth::RefreshToken;
pub use auth::ScopedApiKey;
pub use auth::Session;
pub use auth::SessionContext;
pub use auth::SessionStore;
pub use auth::SharedAuthManager;
pub use auth::TokenClaims;
pub use auth::TokenRequest;
pub use auth::TokenResponse;
pub use auth::UserInfo;
pub use analytics::create_default_rules;
pub use analytics::global_dashboard;
pub use analytics::init_global_dashboard;
pub use analytics::record_global;
pub use analytics::time_global;
pub use analytics::ActiveAlert;
pub use analytics::AlertHandler;
pub use analytics::AlertManager;
pub use analytics::AlertMetric;
pub use analytics::AlertRule;
pub use analytics::AlertSeverity;
pub use analytics::Dashboard;
pub use analytics::DashboardConfig;
pub use analytics::DashboardSnapshot;
pub use analytics::LoggingAlertHandler;
pub use analytics::Metric;
pub use analytics::MetricStats;
pub use analytics::MetricType as AnalyticsMetricType;
pub use analytics::MetricValue;
pub use analytics::MetricsCollector;
pub use analytics::OperationCategory;
pub use analytics::OperationRecord;
pub use analytics::RateCalculator;
pub use analytics::ResourceSnapshot;
pub use analytics::ScopedTimer;
pub use analytics::ThresholdOperator;
pub use analytics::TimeResolution;

Modules§

analytics
Real-time analytics dashboard module.
arrow_integration
Arrow integration module for ecosystem compatibility.
audit
Audit logging module.
auth
Enterprise authentication module (JWT, OAuth, API Keys).
column
Column implementations and column-level operations.
compute
Compute module for computation functionality.
config
Configuration management for secure settings and credentials.
connectors
Data connectors for databases and cloud storage.
core
Core module with fundamental data structures and traits.
dataframe
DataFrame data structure and operations.
distributed
Distributed Processing Module
error
Error types and error handling utilities.
graph
Graph analytics module.
groupby
GroupBy operations for split-apply-combine workflows.
index
Index types for DataFrame and Series labeling.
io
Input/output operations for reading and writing data.
jupyter
Jupyter notebook integration and display formatting.
large
Large dataset processing with chunking and disk-based operations.
ml
Machine learning algorithms and utilities.
multitenancy
Multi-tenancy support module.
na
NA (Not Available) value handling and missing data operations.
optimized
Optimized implementations using SIMD and vectorization.
parallel
Parallel processing utilities and thread pool management.
pivot
Pivot table operations for data reshaping.
plugins
Plugin system for custom data sources, transforms, and sinks.
schema_evolution
Schema evolution and migration tools.
scirs2_integration
SciRS2 integration module for scientific computing capabilities.
series
Series data structure for one-dimensional labeled data.
stats
Statistical functions and analysis tools.
storage
Storage module for data storage engines.
streaming
Real-time streaming data processing.
temporal
Temporal operations for time-based data.
time_series
Time series analysis and forecasting.
versioning
Data versioning and lineage tracking module.
vis
Visualization and plotting utilities.

Macros§

agg_spec
Create aggregation specification (similar to pandas)
column_aggs
Create multiple named aggregations for a column
iloc
Macro for convenient indexing
loc
lock_and_clone
Lock a Mutex and clone the contained value
lock_safe
Safely acquire a Mutex lock, converting poison errors to Result
named_agg
Helper macros for creating aggregation specifications Create a named aggregation
read_lock_safe
Safely acquire a RwLock read lock
select
write_lock_safe
Safely acquire a RwLock write lock

Constants§

VERSION
The current version of the PandRS library.