Veloxx: Lightweight Rust-Powered Data Processing & Analytics Library

Veloxx is a new Rust library designed for highly performant and extremely lightweight in-memory data processing and analytics. It prioritizes minimal dependencies, optimal memory footprint, and compile-time guarantees, making it an ideal choice for resource-constrained environments, high-performance computing, and applications where every byte and cycle counts.

Core Principles & Design Goals

Extreme Lightweighting: Strives for zero or very few, carefully selected external crates. Focuses on minimal overhead and small binary size.
Performance First: Leverages Rust's zero-cost abstractions, with potential for SIMD and parallelism. Data structures are optimized for cache efficiency.
Safety & Reliability: Fully utilizes Rust's ownership and borrowing system to ensure memory safety and prevent common data manipulation errors. Unsafe code is minimized and thoroughly audited.
Ergonomics & Idiomatic Rust API: Designed for a clean, discoverable, and user-friendly API that feels natural to Rust developers, supporting method chaining and strong static typing.
Composability & Extensibility: Features a modular design, allowing components to be independent and easily combinable, and is built to be easily extendable.

Key Features

Core Data Structures

DataFrame: A columnar data store supporting heterogeneous data types per column (i32, f64, bool, String). Efficient storage and handling of missing values.
Series (or Column): A single-typed, named column of data within a DataFrame, providing type-specific operations.

Data Ingestion & Loading

From Vec<Vec<T>> / Iterator: Basic in-memory construction from Rust native collections.
CSV Support: Minimalistic, highly efficient CSV parser for loading data.
JSON Support: Efficient parsing for common JSON structures.
Custom Data Sources: Traits/interfaces for users to implement their own data loading mechanisms.

Data Cleaning & Preparation

drop_nulls(): Remove rows with any null values.
fill_nulls(value): Fill nulls with a specified value (type-aware).
interpolate_nulls(): Basic linear interpolation for numeric series.
Type Casting: Efficient conversion between compatible data types for Series (e.g., i32 to f64).
rename_column(old_name, new_name): Rename columns.

Data Transformation & Manipulation

Selection: select_columns(names), drop_columns(names).
Filtering: Predicate-based row selection using logical (AND, OR, NOT) and comparison operators (==, !=, <, >).
Projection: with_column(new_name, expression), apply() for user-defined functions.
Sorting: Sort DataFrame by one or more columns (ascending/descending).
Joining: Basic inner, left, and right join operations on common keys.
Concatenation/Append: Combine DataFrames vertically.

Aggregation & Reduction

Simple Aggregations: sum(), mean(), median(), min(), max(), count(), std_dev().
Group By: Perform aggregations on groups defined by one or more columns.
Unique Values: unique() for a Series or DataFrame columns.

Basic Analytics & Statistics

describe(): Provides summary statistics for numeric columns (count, mean, std, min, max, quartiles).
correlation(): Calculate Pearson correlation between two numeric Series.
covariance(): Calculate covariance.

Output & Export

To Vec<Vec<T>>: Export DataFrame content back to standard Rust collections.
To CSV: Efficiently write DataFrame to a CSV file.
Display/Pretty Print: User-friendly console output for DataFrame and Series.

Installation

Add the following to your Cargo.toml file:

[dependencies]
veloxx = "0.1.1" # Or the latest version

Usage Example

Here's a quick example demonstrating how to create a DataFrame, filter it, and perform a group-by aggregation:

use veloxx::dataframe::DataFrame;
use veloxx::series::Series;
use veloxx::types::{Value, DataType};
use veloxx::conditions::Condition;
use std::collections::BTreeMap; // Changed from HashMap to BTreeMap

fn main() -> Result<(), String> {
    // 1. Create a DataFrame
    let mut columns = BTreeMap::new(); // Changed from HashMap to BTreeMap
    columns.insert("name".to_string(), Series::new_string("name", vec![Some("Alice".to_string()), Some("Bob".to_string()), Some("Charlie".to_string()), Some("David".to_string())]));
    columns.insert("age".to_string(), Series::new_i32("age", vec![Some(25), Some(30), Some(22), Some(35)]));
    columns.insert("city".to_string(), Series::new_string("city", vec![Some("New York".to_string()), Some("London".to_string()), Some("New York".to_string()), Some("Paris".to_string())]));

    let df = DataFrame::new(columns)?;
    println!("Original DataFrame:
{}", df);

    // 2. Filter data: age > 25
    let condition = Condition::Gt("age".to_string(), Value::I32(25));
    let filtered_df = df.filter(&condition)?;
    println!("
Filtered DataFrame (age > 25):
{}", filtered_df);

    // 3. Group by city and calculate average age
    let grouped_df = df.group_by(vec!["city".to_string()])?;
    let aggregated_df = grouped_df.agg(vec![("age", "mean")])?;
    println!("
Aggregated DataFrame (average age by city):
{}", aggregated_df);

    Ok(())
}

Non-Functional Requirements

Comprehensive Documentation: Extensive /// documentation for all public APIs, examples, and design choices.
Robust Testing: Thorough unit and integration tests covering all functionalities and edge cases.
Performance Benchmarking: Includes benchmarks to track performance and memory usage, ensuring lightweight and high-performance goals are met.
Cross-Platform Compatibility: Designed to work on common operating systems (Linux, macOS, Windows).
Safety: Upholds Rust's safety guarantees, with minimal and heavily justified unsafe code.

Future Considerations / Roadmap

Streaming Data: Support for processing data in a streaming fashion.
Time-Series Functionality: Basic time-series resampling, rolling windows.
FFI (Foreign Function Interface): Consider C API for integration with other languages (Python, JavaScript).
Simple Plotting Integration: Provide hooks or basic data preparation for common plotting libraries.
Persistence: Basic serialization/deserialization formats (e.g., custom binary format, Parquet subset).

License

This project is licensed under the MIT License - see the LICENSE file for details.

veloxx 0.1.2