Crate pandrs

Expand description

§PandRS

A high-performance DataFrame library for Rust, providing pandas-like API with advanced features including SIMD optimization, parallel processing, and distributed computing capabilities.

§Overview

PandRS brings the power and familiarity of pandas to the Rust ecosystem. Built with performance, safety, and ease of use in mind, it provides:

Type-safe operations leveraging Rust’s ownership system
High-performance computing through SIMD vectorization and parallel processing
Memory-efficient design with columnar storage and string pooling
Comprehensive functionality matching pandas’ core features
Seamless interoperability with Python, Arrow, and various data formats

§Quick Start

use pandrs::{DataFrame, Series};

// Create a DataFrame
let mut df = DataFrame::new();
df.add_column("name".to_string(),
    Series::new(vec!["Alice", "Bob", "Carol"], Some("name".to_string())).expect("operation should succeed")).expect("operation should succeed");
df.add_column("age".to_string(),
    Series::new(vec![30i64, 25, 35], Some("age".to_string())).expect("operation should succeed")).expect("operation should succeed");

// Basic operations
let nrows = df.row_count();
let ncols = df.column_count();

§Feature Flags

PandRS supports various feature flags for optional functionality:

Core features:
- stable: Recommended stable feature set
- optimized: Performance optimizations and SIMD
- backward_compat: Backward compatibility support
Data formats:
- parquet: Apache Parquet file support
- excel: Excel file support (read/write)
- sql: Database connectivity (PostgreSQL, MySQL, SQLite)
Advanced features:
- distributed: Distributed computing with DataFusion
- visualization: Plotting capabilities
- streaming: Real-time data processing
- serving: Model serving and deployment
Experimental:
- cuda: GPU acceleration (requires CUDA toolkit)
- wasm: WebAssembly compilation support
- jit: Just-in-time compilation

§Core Data Structures

Series: One-dimensional labeled array capable of holding any data type
DataFrame: Two-dimensional, size-mutable, heterogeneous tabular data structure
MultiIndex: Hierarchical indexing for advanced data organization
Categorical: Memory-efficient representation for string data with limited cardinality

§Modules

dataframe: DataFrame operations and manipulation
series: Series operations and manipulation
stats: Statistical functions and analysis
ml: Machine learning algorithms and utilities
io: Input/output operations for various file formats
streaming: Real-time streaming data processing
time_series: Time series analysis and forecasting
graph: Graph analytics and algorithms

§Version

Current version: 0.1.0

Re-exports§

pub use scirs2_integration::dataframe_ext::SciRS2Ext;
pub use core::column::BitMask as CoreBitMask;
pub use core::column::Column as CoreColumn;
pub use core::column::ColumnCast;
pub use core::column::ColumnTrait;
pub use core::column::ColumnType as CoreColumnType;
pub use core::data_value::DataValue;
pub use core::data_value::DataValueExt;
pub use core::data_value::DisplayExt;
pub use core::error::Error;
pub use core::error::Result;
pub use core::index::Index as CoreIndex;
pub use core::index::IndexTrait;
pub use core::multi_index::MultiIndex as CoreMultiIndex;
pub use config::credentials::CredentialBuilder;
pub use config::credentials::CredentialMetadata;
pub use config::credentials::CredentialStore;
pub use config::credentials::CredentialStoreConfig;
pub use config::credentials::CredentialType;
pub use config::credentials::EncryptedCredential;
pub use config::AccessControlConfig;
pub use config::AuditConfig;
pub use config::AwsConfig;
pub use config::AzureConfig;
pub use config::CachingConfig;
pub use config::CloudConfig;
pub use config::ConnectionPoolConfig;
pub use config::DatabaseConfig;
pub use config::EncryptionConfig;
pub use config::GcpConfig;
pub use config::GlobalCloudConfig;
pub use config::JitConfig;
pub use config::LogRotationConfig;
pub use config::LoggingConfig;
pub use config::MemoryConfig;
pub use config::PandRSConfig;
pub use config::PerformanceConfig;
pub use config::SecurityConfig;
pub use config::SslConfig;
pub use config::ThreadingConfig;
pub use config::TimeoutConfig;
pub use column::BooleanColumn;
pub use column::Column;
pub use column::ColumnType;
pub use column::Float64Column;
pub use column::Int64Column;
pub use column::StringColumn;
pub use dataframe::DataFrame;
pub use dataframe::MeltOptions;
pub use dataframe::StackOptions;
pub use dataframe::UnstackOptions;
pub use error::PandRSError;
pub use groupby::GroupBy;
pub use index::DataFrameIndex;
pub use index::Index;
pub use index::IndexTrait as LegacyIndexTrait;
pub use index::MultiIndex;
pub use index::RangeIndex;
pub use index::StringIndex;
pub use index::StringMultiIndex;
pub use na::NA;
pub use optimized::AggregateOp;
pub use optimized::JoinType;
pub use optimized::LazyFrame;
pub use optimized::OptimizedDataFrame;
pub use parallel::ParallelUtils;
pub use series::Categorical;
pub use series::CategoricalOrder;
pub use series::NASeries;
pub use series::Series;
pub use series::StringCategorical;
pub use stats::DescriptiveStats;
pub use stats::LinearRegressionResult;
pub use stats::TTestResult;
pub use vis::OutputFormat;
pub use vis::PlotConfig;
pub use vis::PlotType;
pub use vis::svg::BarChart as SvgBarChart;
pub use vis::svg::BarOrientation as SvgBarOrientation;
pub use vis::svg::Color as SvgColor;
pub use vis::svg::ColorScheme as SvgColorScheme;
pub use vis::svg::DrawStyle;
pub use vis::svg::HeatMap as SvgHeatMap;
pub use vis::svg::LegendPosition;
pub use vis::svg::LineChart as SvgLineChart;
pub use vis::svg::LineSeries;
pub use vis::svg::Margins as SvgMargins;
pub use vis::svg::MarkerShape;
pub use vis::svg::PathBuilder;
pub use vis::svg::PieChart as SvgPieChart;
pub use vis::svg::ScatterPlot as SvgScatterPlot;
pub use vis::svg::SvgCanvas;
pub use vis::svg::SvgChartConfig;
pub use vis::svg::SvgHistogram;
pub use vis::svg::SvgPlotType;
pub use vis::svg::SvgVisualize;
pub use vis::svg::Transform as SvgTransform;
pub use jupyter::get_jupyter_config;
pub use jupyter::init_jupyter;
pub use jupyter::jupyter_dark_mode;
pub use jupyter::jupyter_light_mode;
pub use jupyter::set_jupyter_config;
pub use jupyter::JupyterColorScheme;
pub use jupyter::JupyterConfig;
pub use jupyter::JupyterDisplay;
pub use jupyter::JupyterMagics;
pub use jupyter::TableStyle;
pub use jupyter::TableWidth;
pub use ml::anomaly::IsolationForest;
pub use ml::anomaly::LocalOutlierFactor;
pub use ml::anomaly::OneClassSVM;
pub use ml::clustering::AgglomerativeClustering;
pub use ml::clustering::DistanceMetric;
pub use ml::clustering::KMeans;
pub use ml::clustering::Linkage;
pub use ml::clustering::DBSCAN;
pub use ml::dimension::TSNEInit;
pub use ml::dimension::PCA;
pub use ml::dimension::TSNE;
pub use ml::metrics::classification::accuracy_score;
pub use ml::metrics::classification::f1_score;
pub use ml::metrics::classification::precision_score;
pub use ml::metrics::classification::recall_score;
pub use ml::metrics::regression::explained_variance_score;
pub use ml::metrics::regression::mean_absolute_error;
pub use ml::metrics::regression::mean_squared_error;
pub use ml::metrics::regression::r2_score;
pub use ml::metrics::regression::root_mean_squared_error;
pub use ml::models::ensemble::GradientBoostingClassifier;
pub use ml::models::ensemble::GradientBoostingConfig;
pub use ml::models::ensemble::GradientBoostingRegressor;
pub use ml::models::ensemble::RandomForestClassifier;
pub use ml::models::ensemble::RandomForestConfig;
pub use ml::models::ensemble::RandomForestRegressor;
pub use ml::models::linear::LinearRegression;
pub use ml::models::linear::LogisticRegression;
pub use ml::models::neural::Activation;
pub use ml::models::neural::LossFunction;
pub use ml::models::neural::MLPClassifier;
pub use ml::models::neural::MLPConfig;
pub use ml::models::neural::MLPConfigBuilder;
pub use ml::models::neural::MLPRegressor;
pub use ml::models::tree::DecisionTreeClassifier;
pub use ml::models::tree::DecisionTreeConfig;
pub use ml::models::tree::DecisionTreeRegressor;
pub use ml::models::tree::SplitCriterion;
pub use ml::models::train_test_split;
pub use ml::models::CrossValidation;
pub use ml::models::ModelEvaluator;
pub use ml::models::ModelMetrics;
pub use ml::models::SupervisedModel;
pub use ml::models::UnsupervisedModel;
pub use ml::pipeline::Pipeline;
pub use ml::pipeline::PipelineStage;
pub use ml::pipeline::PipelineTransformer;
pub use ml::preprocessing::Binner;
pub use ml::preprocessing::FeatureSelector;
pub use ml::preprocessing::ImputeStrategy;
pub use ml::preprocessing::Imputer;
pub use ml::preprocessing::MinMaxScaler;
pub use ml::preprocessing::OneHotEncoder;
pub use ml::preprocessing::PolynomialFeatures;
pub use ml::preprocessing::StandardScaler;
pub use large::external_sort;
pub use large::merge_sorted_chunks;
pub use large::hash_join_out_of_core;
pub use large::OutOfCoreJoinType;
pub use large::AggOp as OutOfCoreAggOp;
pub use large::OutOfCoreConfig;
pub use large::OutOfCoreReader;
pub use large::OutOfCoreWriter;
pub use large::ChunkedDataFrame;
pub use large::DiskBasedDataFrame;
pub use large::DiskBasedOptimizedDataFrame;
pub use large::DiskConfig;
pub use streaming::AggregationType;
pub use streaming::BackpressureBuffer;
pub use streaming::BackpressureChannel;
pub use streaming::BackpressureConfig;
pub use streaming::BackpressureConfigBuilder;
pub use streaming::BackpressureStats;
pub use streaming::BackpressureStrategy;
pub use streaming::DataStream;
pub use streaming::FlowController;
pub use streaming::MetricType;
pub use streaming::MultiColumnAggregator;
pub use streaming::RealTimeAnalytics;
pub use streaming::StreamAggregator;
pub use streaming::StreamConfig;
pub use streaming::StreamConnector;
pub use streaming::StreamProcessor;
pub use streaming::StreamRecord;
pub use streaming::TimeWindow;
pub use streaming::WindowAggregation;
pub use streaming::WindowConfig;
pub use streaming::WindowConfigBuilder;
pub use streaming::WindowResult;
pub use streaming::WindowType;
pub use streaming::WindowedAggregator;
pub use time_series::ArimaForecaster;
pub use time_series::AugmentedDickeyFullerTest;
pub use time_series::AutoArima;
pub use time_series::AutocorrelationAnalysis;
pub use time_series::ChangePointDetection;
pub use time_series::DateTimeIndex;
pub use time_series::DecompositionMethod;
pub use time_series::DecompositionResult;
pub use time_series::Differencing;
pub use time_series::ExponentialSmoothingForecaster;
pub use time_series::FeatureSet;
pub use time_series::ForecastMetrics;
pub use time_series::ForecastResult;
pub use time_series::Forecaster;
pub use time_series::Frequency;
pub use time_series::KwiatkowskiPhillipsSchmidtShinTest;
pub use time_series::LinearTrendForecaster;
pub use time_series::MissingValueStrategy;
pub use time_series::ModelSelectionCriterion;
pub use time_series::ModelSelectionResult;
pub use time_series::Normalization;
pub use time_series::OutlierDetection;
pub use time_series::SarimaForecaster;
pub use time_series::SeasonalDecomposition;
pub use time_series::SeasonalTest;
pub use time_series::SeasonalityAnalysis;
pub use time_series::SimpleMovingAverageForecaster;
pub use time_series::StationarityTest;
pub use time_series::StatisticalFeatures;
pub use time_series::TimePoint;
pub use time_series::TimeSeries;
pub use time_series::TimeSeriesBuilder;
pub use time_series::TimeSeriesFeatureExtractor;
pub use time_series::TimeSeriesPreprocessor;
pub use time_series::TimeSeriesStats;
pub use time_series::TrendAnalysis;
pub use time_series::WhiteNoiseTest;
pub use time_series::WindowFeatures;
pub use compute::lazy::LazyFrame as ComputeLazyFrame;
pub use compute::parallel::ParallelUtils as ComputeParallelUtils;
pub use storage::column_store::ColumnStore;
pub use storage::disk::DiskStorage;
pub use storage::memory_mapped::MemoryMappedFile;
pub use storage::string_pool::StringPool as StorageStringPool;
pub use distributed::core::DistributedConfig;
pub use distributed::core::DistributedDataFrame;
pub use distributed::core::ToDistributed;
pub use distributed::execution::ExecutionContext;
pub use distributed::execution::ExecutionEngine;
pub use distributed::execution::ExecutionPlan;
pub use graph::bellman_ford_default;
pub use graph::betweenness_centrality;
pub use graph::bfs;
pub use graph::closeness_centrality;
pub use graph::connected_components;
pub use graph::degree_centrality;
pub use graph::dfs;
pub use graph::dijkstra;
pub use graph::dijkstra_default;
pub use graph::eigenvector_centrality_default;
pub use graph::floyd_warshall_default;
pub use graph::from_adjacency_matrix;
pub use graph::from_edge_dataframe;
pub use graph::has_cycle;
pub use graph::hits_default;
pub use graph::is_connected;
pub use graph::label_propagation;
pub use graph::louvain_default;
pub use graph::modularity;
pub use graph::pagerank;
pub use graph::pagerank_default;
pub use graph::shortest_path_bfs;
pub use graph::strongly_connected_components;
pub use graph::to_adjacency_matrix;
pub use graph::to_edge_dataframe;
pub use graph::topological_sort;
pub use graph::AllPairsShortestPaths;
pub use graph::BfsResult;
pub use graph::ComponentResult;
pub use graph::DfsResult;
pub use graph::Edge;
pub use graph::EdgeId;
pub use graph::Graph;
pub use graph::GraphBuilder;
pub use graph::GraphError;
pub use graph::GraphType;
pub use graph::Node;
pub use graph::NodeId;
pub use graph::ShortestPathResult;
pub use versioning::DataFrameVersioning;
pub use versioning::DataSchema;
pub use versioning::DataVersion;
pub use versioning::LineageConfig;
pub use versioning::LineageTracker;
pub use versioning::Operation;
pub use versioning::OperationType;
pub use versioning::SharedLineageTracker;
pub use versioning::TrackerStats;
pub use versioning::VersionDiff;
pub use versioning::VersionId;
pub use versioning::VersionedTransform;
pub use versioning::VersioningError;
pub use schema_evolution::BreakingChange;
pub use schema_evolution::ColumnSchema;
pub use schema_evolution::CompatibilityReport;
pub use schema_evolution::DataFrameSchema;
pub use schema_evolution::DefaultValue;
pub use schema_evolution::Migration;
pub use schema_evolution::MigrationBuilder;
pub use schema_evolution::SchemaChange;
pub use schema_evolution::SchemaConstraint;
pub use schema_evolution::SchemaDataType;
pub use schema_evolution::SchemaFormat;
pub use schema_evolution::SchemaMigrator;
pub use schema_evolution::SchemaRegistry;
pub use schema_evolution::SchemaVersion;
pub use schema_evolution::ValidationError;
pub use schema_evolution::ValidationErrorType;
pub use schema_evolution::ValidationReport;
pub use audit::global_logger;
pub use audit::init_global_logger;
pub use audit::log_global;
pub use audit::AuditConfig as AuditLogConfig;
pub use audit::AuditConfigBuilder as AuditLogConfigBuilder;
pub use audit::AuditEntry;
pub use audit::AuditLogger;
pub use audit::AuditStats;
pub use audit::EventCategory;
pub use audit::LogContext;
pub use audit::LogDestination;
pub use audit::LogLevel;
pub use audit::SharedAuditLogger;
pub use multitenancy::create_shared_manager;
pub use multitenancy::DatasetId;
pub use multitenancy::DatasetMetadata;
pub use multitenancy::IsolationContext;
pub use multitenancy::Permission;
pub use multitenancy::ResourceQuota;
pub use multitenancy::SharedTenantManager;
pub use multitenancy::TenantAuditEntry;
pub use multitenancy::TenantConfig;
pub use multitenancy::TenantId;
pub use multitenancy::TenantManager;
pub use multitenancy::TenantOperation;
pub use multitenancy::TenantUsage;
pub use auth::create_shared_auth_manager;
pub use auth::decode_jwt;
pub use auth::encode_jwt;
pub use auth::get_token_expiration;
pub use auth::is_token_expired;
pub use auth::verify_jwt;
pub use auth::ApiKeyInfo;
pub use auth::ApiKeyManager;
pub use auth::ApiKeyStats;
pub use auth::AuthEvent;
pub use auth::AuthEventType;
pub use auth::AuthManager;
pub use auth::AuthMethod;
pub use auth::AuthResult;
pub use auth::AuthorizationRequest;
pub use auth::IntrospectionResponse;
pub use auth::JwtConfig;
pub use auth::OAuthClient;
pub use auth::OAuthClientInfo;
pub use auth::OAuthConfig;
pub use auth::OAuthGrantType;
pub use auth::RefreshToken;
pub use auth::ScopedApiKey;
pub use auth::Session;
pub use auth::SessionContext;
pub use auth::SessionStore;
pub use auth::SharedAuthManager;
pub use auth::TokenClaims;
pub use auth::TokenRequest;
pub use auth::TokenResponse;
pub use auth::UserInfo;
pub use analytics::create_default_rules;
pub use analytics::global_dashboard;
pub use analytics::init_global_dashboard;
pub use analytics::record_global;
pub use analytics::time_global;
pub use analytics::ActiveAlert;
pub use analytics::AlertHandler;
pub use analytics::AlertManager;
pub use analytics::AlertMetric;
pub use analytics::AlertRule;
pub use analytics::AlertSeverity;
pub use analytics::Dashboard;
pub use analytics::DashboardConfig;
pub use analytics::DashboardSnapshot;
pub use analytics::LoggingAlertHandler;
pub use analytics::Metric;
pub use analytics::MetricStats;
pub use analytics::MetricType as AnalyticsMetricType;
pub use analytics::MetricValue;
pub use analytics::MetricsCollector;
pub use analytics::OperationCategory;
pub use analytics::OperationRecord;
pub use analytics::RateCalculator;
pub use analytics::ResourceSnapshot;
pub use analytics::ScopedTimer;
pub use analytics::ThresholdOperator;
pub use analytics::TimeResolution;

Modules§

analytics: Real-time analytics dashboard module.
arrow_integration: Arrow integration module for ecosystem compatibility.
audit: Audit logging module.
auth: Enterprise authentication module (JWT, OAuth, API Keys).
column: Column implementations and column-level operations.
compute: Compute module for computation functionality.
config: Configuration management for secure settings and credentials.
connectors: Data connectors for databases and cloud storage.
core: Core module with fundamental data structures and traits.
dataframe: DataFrame data structure and operations.
distributed: Distributed Processing Module
error: Error types and error handling utilities.
graph: Graph analytics module.
groupby: GroupBy operations for split-apply-combine workflows.
index: Index types for DataFrame and Series labeling.
io: Input/output operations for reading and writing data.
jupyter: Jupyter notebook integration and display formatting.
large: Large dataset processing with chunking and disk-based operations.
ml: Machine learning algorithms and utilities.
multitenancy: Multi-tenancy support module.
na: NA (Not Available) value handling and missing data operations.
optimized: Optimized implementations using SIMD and vectorization.
parallel: Parallel processing utilities and thread pool management.
pivot: Pivot table operations for data reshaping.
plugins: Plugin system for custom data sources, transforms, and sinks.
schema_evolution: Schema evolution and migration tools.
scirs2_integration: SciRS2 integration module for scientific computing capabilities.
series: Series data structure for one-dimensional labeled data.
stats: Statistical functions and analysis tools.
storage: Storage module for data storage engines.
streaming: Real-time streaming data processing.
temporal: Temporal operations for time-based data.
time_series: Time series analysis and forecasting.
versioning: Data versioning and lineage tracking module.
vis: Visualization and plotting utilities.

Macros§

agg_spec: Create aggregation specification (similar to pandas)
column_aggs: Create multiple named aggregations for a column
iloc: Macro for convenient indexing
loc
lock_and_clone: Lock a Mutex and clone the contained value
lock_safe: Safely acquire a Mutex lock, converting poison errors to Result
named_agg: Helper macros for creating aggregation specifications Create a named aggregation
read_lock_safe: Safely acquire a RwLock read lock
select
write_lock_safe: Safely acquire a RwLock write lock

Constants§

VERSION: The current version of the PandRS library.

Crate pandrs

Crate pandrs Copy item path

§PandRS

§Overview

§Quick Start

§Feature Flags

§Core Data Structures

§Modules

§Version

Re-exports§

Modules§

Macros§

Constants§

Crate pandrs