Crate pandrs

Crate pandrs 

Source
Expand description

§PandRS

A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.

§Overview

PandRS brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:

  • Type-safe operations leveraging Rust’s ownership system
  • High-performance computing through SIMD vectorization and parallel processing
  • Memory-efficient design with columnar storage and string pooling
  • Comprehensive functionality matching pandas’ core features
  • Seamless interoperability with Python, Arrow, and various data formats

§Quick Start

use pandrs::{DataFrame, Series};

// Create a DataFrame
let mut df = DataFrame::new();
df.add_column("name".to_string(),
    Series::new(vec!["Alice", "Bob", "Carol"], Some("name".to_string())).unwrap()).unwrap();
df.add_column("age".to_string(),
    Series::new(vec![30i64, 25, 35], Some("age".to_string())).unwrap()).unwrap();

// Basic operations
let nrows = df.row_count();
let ncols = df.column_count();

§Feature Flags

PandRS supports various feature flags for optional functionality:

  • Core features:

    • stable: Recommended stable feature set
    • optimized: Performance optimizations and SIMD
    • backward_compat: Backward compatibility support
  • Data formats:

    • parquet: Apache Parquet file support
    • excel: Excel file support (read/write)
    • sql: Database connectivity (PostgreSQL, MySQL, SQLite)
  • Advanced features:

    • distributed: Distributed computing with DataFusion
    • visualization: Plotting capabilities
    • streaming: Real-time data processing
    • serving: Model serving and deployment
  • Experimental:

    • cuda: GPU acceleration (requires CUDA toolkit)
    • wasm: WebAssembly compilation support
    • jit: Just-in-time compilation

§Core Data Structures

  • Series: One-dimensional labeled array capable of holding any data type
  • DataFrame: Two-dimensional, size-mutable, heterogeneous tabular data structure
  • MultiIndex: Hierarchical indexing for advanced data organization
  • Categorical: Memory-efficient representation for string data with limited cardinality

§Modules

  • dataframe: DataFrame operations and manipulation
  • series: Series operations and manipulation
  • stats: Statistical functions and analysis
  • ml: Machine learning algorithms and utilities
  • io: Input/output operations for various file formats
  • streaming: Real-time streaming data processing
  • time_series: Time series analysis and forecasting
  • graph: Graph analytics and algorithms

§Version

Current version: 0.1.0

Re-exports§

pub use core::column::BitMask as CoreBitMask;
pub use core::column::Column as CoreColumn;
pub use core::column::ColumnCast;
pub use core::column::ColumnTrait;
pub use core::column::ColumnType as CoreColumnType;
pub use core::data_value::DataValue;
pub use core::data_value::DataValueExt;
pub use core::data_value::DisplayExt;
pub use core::error::Error;
pub use core::error::Result;
pub use core::index::Index as CoreIndex;
pub use core::index::IndexTrait;
pub use core::multi_index::MultiIndex as CoreMultiIndex;
pub use config::credentials::CredentialBuilder;
pub use config::credentials::CredentialMetadata;
pub use config::credentials::CredentialStore;
pub use config::credentials::CredentialStoreConfig;
pub use config::credentials::CredentialType;
pub use config::credentials::EncryptedCredential;
pub use config::AccessControlConfig;
pub use config::AuditConfig;
pub use config::AwsConfig;
pub use config::AzureConfig;
pub use config::CachingConfig;
pub use config::CloudConfig;
pub use config::ConnectionPoolConfig;
pub use config::DatabaseConfig;
pub use config::EncryptionConfig;
pub use config::GcpConfig;
pub use config::GlobalCloudConfig;
pub use config::JitConfig;
pub use config::LogRotationConfig;
pub use config::LoggingConfig;
pub use config::MemoryConfig;
pub use config::PandRSConfig;
pub use config::PerformanceConfig;
pub use config::SecurityConfig;
pub use config::SslConfig;
pub use config::ThreadingConfig;
pub use config::TimeoutConfig;
pub use column::BooleanColumn;
pub use column::Column;
pub use column::ColumnType;
pub use column::Float64Column;
pub use column::Int64Column;
pub use column::StringColumn;
pub use dataframe::DataFrame;
pub use dataframe::MeltOptions;
pub use dataframe::StackOptions;
pub use dataframe::UnstackOptions;
pub use error::PandRSError;
pub use groupby::GroupBy;
pub use index::DataFrameIndex;
pub use index::Index;
pub use index::IndexTrait as LegacyIndexTrait;
pub use index::MultiIndex;
pub use index::RangeIndex;
pub use index::StringIndex;
pub use index::StringMultiIndex;
pub use na::NA;
pub use optimized::AggregateOp;
pub use optimized::JoinType;
pub use optimized::LazyFrame;
pub use optimized::OptimizedDataFrame;
pub use parallel::ParallelUtils;
pub use series::Categorical;
pub use series::CategoricalOrder;
pub use series::NASeries;
pub use series::Series;
pub use series::StringCategorical;
pub use stats::DescriptiveStats;
pub use stats::LinearRegressionResult;
pub use stats::TTestResult;
pub use vis::OutputFormat;
pub use vis::PlotConfig;
pub use vis::PlotType;
pub use jupyter::get_jupyter_config;
pub use jupyter::init_jupyter;
pub use jupyter::jupyter_dark_mode;
pub use jupyter::jupyter_light_mode;
pub use jupyter::set_jupyter_config;
pub use jupyter::JupyterColorScheme;
pub use jupyter::JupyterConfig;
pub use jupyter::JupyterDisplay;
pub use jupyter::JupyterMagics;
pub use jupyter::TableStyle;
pub use jupyter::TableWidth;
pub use ml::anomaly::IsolationForest;
pub use ml::anomaly::LocalOutlierFactor;
pub use ml::anomaly::OneClassSVM;
pub use ml::clustering::AgglomerativeClustering;
pub use ml::clustering::DistanceMetric;
pub use ml::clustering::KMeans;
pub use ml::clustering::Linkage;
pub use ml::clustering::DBSCAN;
pub use ml::dimension::TSNEInit;
pub use ml::dimension::PCA;
pub use ml::dimension::TSNE;
pub use ml::metrics::classification::accuracy_score;
pub use ml::metrics::classification::f1_score;
pub use ml::metrics::classification::precision_score;
pub use ml::metrics::classification::recall_score;
pub use ml::metrics::regression::explained_variance_score;
pub use ml::metrics::regression::mean_absolute_error;
pub use ml::metrics::regression::mean_squared_error;
pub use ml::metrics::regression::r2_score;
pub use ml::metrics::regression::root_mean_squared_error;
pub use ml::models::ensemble::GradientBoostingClassifier;
pub use ml::models::ensemble::GradientBoostingConfig;
pub use ml::models::ensemble::GradientBoostingRegressor;
pub use ml::models::ensemble::RandomForestClassifier;
pub use ml::models::ensemble::RandomForestConfig;
pub use ml::models::ensemble::RandomForestRegressor;
pub use ml::models::linear::LinearRegression;
pub use ml::models::linear::LogisticRegression;
pub use ml::models::neural::Activation;
pub use ml::models::neural::LossFunction;
pub use ml::models::neural::MLPClassifier;
pub use ml::models::neural::MLPConfig;
pub use ml::models::neural::MLPConfigBuilder;
pub use ml::models::neural::MLPRegressor;
pub use ml::models::tree::DecisionTreeClassifier;
pub use ml::models::tree::DecisionTreeConfig;
pub use ml::models::tree::DecisionTreeRegressor;
pub use ml::models::tree::SplitCriterion;
pub use ml::models::train_test_split;
pub use ml::models::CrossValidation;
pub use ml::models::ModelEvaluator;
pub use ml::models::ModelMetrics;
pub use ml::models::SupervisedModel;
pub use ml::models::UnsupervisedModel;
pub use ml::pipeline::Pipeline;
pub use ml::pipeline::PipelineStage;
pub use ml::pipeline::PipelineTransformer;
pub use ml::preprocessing::Binner;
pub use ml::preprocessing::FeatureSelector;
pub use ml::preprocessing::ImputeStrategy;
pub use ml::preprocessing::Imputer;
pub use ml::preprocessing::MinMaxScaler;
pub use ml::preprocessing::OneHotEncoder;
pub use ml::preprocessing::PolynomialFeatures;
pub use ml::preprocessing::StandardScaler;
pub use large::ChunkedDataFrame;
pub use large::DiskBasedDataFrame;
pub use large::DiskBasedOptimizedDataFrame;
pub use large::DiskConfig;
pub use streaming::AggregationType;
pub use streaming::BackpressureBuffer;
pub use streaming::BackpressureChannel;
pub use streaming::BackpressureConfig;
pub use streaming::BackpressureConfigBuilder;
pub use streaming::BackpressureStats;
pub use streaming::BackpressureStrategy;
pub use streaming::DataStream;
pub use streaming::FlowController;
pub use streaming::MetricType;
pub use streaming::MultiColumnAggregator;
pub use streaming::RealTimeAnalytics;
pub use streaming::StreamAggregator;
pub use streaming::StreamConfig;
pub use streaming::StreamConnector;
pub use streaming::StreamProcessor;
pub use streaming::StreamRecord;
pub use streaming::TimeWindow;
pub use streaming::WindowAggregation;
pub use streaming::WindowConfig;
pub use streaming::WindowConfigBuilder;
pub use streaming::WindowResult;
pub use streaming::WindowType;
pub use streaming::WindowedAggregator;
pub use time_series::ArimaForecaster;
pub use time_series::AugmentedDickeyFullerTest;
pub use time_series::AutoArima;
pub use time_series::AutocorrelationAnalysis;
pub use time_series::ChangePointDetection;
pub use time_series::DateTimeIndex;
pub use time_series::DecompositionMethod;
pub use time_series::DecompositionResult;
pub use time_series::Differencing;
pub use time_series::ExponentialSmoothingForecaster;
pub use time_series::FeatureSet;
pub use time_series::ForecastMetrics;
pub use time_series::ForecastResult;
pub use time_series::Forecaster;
pub use time_series::Frequency;
pub use time_series::KwiatkowskiPhillipsSchmidtShinTest;
pub use time_series::LinearTrendForecaster;
pub use time_series::MissingValueStrategy;
pub use time_series::ModelSelectionCriterion;
pub use time_series::ModelSelectionResult;
pub use time_series::Normalization;
pub use time_series::OutlierDetection;
pub use time_series::SarimaForecaster;
pub use time_series::SeasonalDecomposition;
pub use time_series::SeasonalTest;
pub use time_series::SeasonalityAnalysis;
pub use time_series::SimpleMovingAverageForecaster;
pub use time_series::StationarityTest;
pub use time_series::StatisticalFeatures;
pub use time_series::TimePoint;
pub use time_series::TimeSeries;
pub use time_series::TimeSeriesBuilder;
pub use time_series::TimeSeriesFeatureExtractor;
pub use time_series::TimeSeriesPreprocessor;
pub use time_series::TimeSeriesStats;
pub use time_series::TrendAnalysis;
pub use time_series::WhiteNoiseTest;
pub use time_series::WindowFeatures;
pub use compute::lazy::LazyFrame as ComputeLazyFrame;
pub use compute::parallel::ParallelUtils as ComputeParallelUtils;
pub use storage::column_store::ColumnStore;
pub use storage::disk::DiskStorage;
pub use storage::memory_mapped::MemoryMappedFile;
pub use storage::string_pool::StringPool as StorageStringPool;
pub use distributed::core::DistributedConfig;
pub use distributed::core::DistributedDataFrame;
pub use distributed::core::ToDistributed;
pub use distributed::execution::ExecutionContext;
pub use distributed::execution::ExecutionEngine;
pub use distributed::execution::ExecutionPlan;
pub use graph::bellman_ford_default;
pub use graph::betweenness_centrality;
pub use graph::bfs;
pub use graph::closeness_centrality;
pub use graph::connected_components;
pub use graph::degree_centrality;
pub use graph::dfs;
pub use graph::dijkstra;
pub use graph::dijkstra_default;
pub use graph::eigenvector_centrality_default;
pub use graph::floyd_warshall_default;
pub use graph::from_adjacency_matrix;
pub use graph::from_edge_dataframe;
pub use graph::has_cycle;
pub use graph::hits_default;
pub use graph::is_connected;
pub use graph::label_propagation;
pub use graph::louvain_default;
pub use graph::modularity;
pub use graph::pagerank;
pub use graph::pagerank_default;
pub use graph::shortest_path_bfs;
pub use graph::strongly_connected_components;
pub use graph::to_adjacency_matrix;
pub use graph::to_edge_dataframe;
pub use graph::topological_sort;
pub use graph::AllPairsShortestPaths;
pub use graph::BfsResult;
pub use graph::ComponentResult;
pub use graph::DfsResult;
pub use graph::Edge;
pub use graph::EdgeId;
pub use graph::Graph;
pub use graph::GraphBuilder;
pub use graph::GraphError;
pub use graph::GraphType;
pub use graph::Node;
pub use graph::NodeId;
pub use graph::ShortestPathResult;
pub use versioning::DataFrameVersioning;
pub use versioning::DataSchema;
pub use versioning::DataVersion;
pub use versioning::LineageConfig;
pub use versioning::LineageTracker;
pub use versioning::Operation;
pub use versioning::OperationType;
pub use versioning::SharedLineageTracker;
pub use versioning::TrackerStats;
pub use versioning::VersionDiff;
pub use versioning::VersionId;
pub use versioning::VersionedTransform;
pub use versioning::VersioningError;
pub use audit::global_logger;
pub use audit::init_global_logger;
pub use audit::log_global;
pub use audit::AuditConfig as AuditLogConfig;
pub use audit::AuditConfigBuilder as AuditLogConfigBuilder;
pub use audit::AuditEntry;
pub use audit::AuditLogger;
pub use audit::AuditStats;
pub use audit::EventCategory;
pub use audit::LogContext;
pub use audit::LogDestination;
pub use audit::LogLevel;
pub use audit::SharedAuditLogger;
pub use multitenancy::create_shared_manager;
pub use multitenancy::DatasetId;
pub use multitenancy::DatasetMetadata;
pub use multitenancy::IsolationContext;
pub use multitenancy::Permission;
pub use multitenancy::ResourceQuota;
pub use multitenancy::SharedTenantManager;
pub use multitenancy::TenantAuditEntry;
pub use multitenancy::TenantConfig;
pub use multitenancy::TenantId;
pub use multitenancy::TenantManager;
pub use multitenancy::TenantOperation;
pub use multitenancy::TenantUsage;
pub use auth::create_shared_auth_manager;
pub use auth::decode_jwt;
pub use auth::encode_jwt;
pub use auth::get_token_expiration;
pub use auth::is_token_expired;
pub use auth::verify_jwt;
pub use auth::ApiKeyInfo;
pub use auth::ApiKeyManager;
pub use auth::ApiKeyStats;
pub use auth::AuthEvent;
pub use auth::AuthEventType;
pub use auth::AuthManager;
pub use auth::AuthMethod;
pub use auth::AuthResult;
pub use auth::AuthorizationRequest;
pub use auth::IntrospectionResponse;
pub use auth::JwtConfig;
pub use auth::OAuthClient;
pub use auth::OAuthClientInfo;
pub use auth::OAuthConfig;
pub use auth::OAuthGrantType;
pub use auth::RefreshToken;
pub use auth::ScopedApiKey;
pub use auth::Session;
pub use auth::SessionContext;
pub use auth::SessionStore;
pub use auth::SharedAuthManager;
pub use auth::TokenClaims;
pub use auth::TokenRequest;
pub use auth::TokenResponse;
pub use auth::UserInfo;
pub use analytics::create_default_rules;
pub use analytics::global_dashboard;
pub use analytics::init_global_dashboard;
pub use analytics::record_global;
pub use analytics::time_global;
pub use analytics::ActiveAlert;
pub use analytics::AlertHandler;
pub use analytics::AlertManager;
pub use analytics::AlertMetric;
pub use analytics::AlertRule;
pub use analytics::AlertSeverity;
pub use analytics::Dashboard;
pub use analytics::DashboardConfig;
pub use analytics::DashboardSnapshot;
pub use analytics::LoggingAlertHandler;
pub use analytics::Metric;
pub use analytics::MetricStats;
pub use analytics::MetricType as AnalyticsMetricType;
pub use analytics::MetricValue;
pub use analytics::MetricsCollector;
pub use analytics::OperationCategory;
pub use analytics::OperationRecord;
pub use analytics::RateCalculator;
pub use analytics::ResourceSnapshot;
pub use analytics::ScopedTimer;
pub use analytics::ThresholdOperator;
pub use analytics::TimeResolution;

Modules§

analytics
Real-Time Analytics Dashboard Module
arrow_integration
Apache Arrow Integration
audit
Audit logging module for DataFrame operations
auth
Enterprise Authentication Module
column
compute
config
Configuration management for PandRS
connectors
Data Connectors
core
dataframe
distributed
Distributed Processing Module
error
graph
Graph analytics module for PandRS
groupby
index
io
jupyter
Jupyter Notebook Integration for PandRS
large
Module for handling large datasets
ml
Machine Learning Module
multitenancy
Multi-Tenancy Support Module
na
optimized
parallel
Module providing parallel processing functionality
pivot
Module providing pivot table functionality
series
stats
PandRS Statistics Module
storage
streaming
Module for streaming data processing
temporal
Module for time series data manipulation
time_series
Time Series Analysis and Forecasting Module
versioning
Data versioning and lineage tracking module
vis
Module providing data visualization functionality

Macros§

agg_spec
Create aggregation specification (similar to pandas)
column_aggs
Create multiple named aggregations for a column
iloc
Macro for convenient indexing
loc
named_agg
Helper macros for creating aggregation specifications Create a named aggregation
select

Constants§

VERSION