Veloxx: Lightweight Rust-Powered Data Processing & Analytics Library
New in 0.2.1: Major performance improvements across all core operations. See CHANGELOG for details.
Veloxx is a new Rust library designed for highly performant and extremely lightweight in-memory data processing and analytics. It prioritizes minimal dependencies, optimal memory footprint, and compile-time guarantees, making it an ideal choice for resource-constrained environments, high-performance computing, and applications where every byte and cycle counts.
Core Principles & Design Goals
- Extreme Lightweighting: Strives for zero or very few, carefully selected external crates. Focuses on minimal overhead and small binary size.
- Performance First: Leverages Rust's zero-cost abstractions, with potential for SIMD and parallelism. Data structures are optimized for cache efficiency.
- Safety & Reliability: Fully utilizes Rust's ownership and borrowing system to ensure memory safety and prevent common data manipulation errors. Unsafe code is minimized and thoroughly audited.
- Ergonomics & Idiomatic Rust API: Designed for a clean, discoverable, and user-friendly API that feels natural to Rust developers, supporting method chaining and strong static typing.
- Composability & Extensibility: Features a modular design, allowing components to be independent and easily combinable, and is built to be easily extendable.
Key Features
Core Data Structures
- DataFrame: A columnar data store supporting heterogeneous data types per column (i32, f64, bool, String, DateTime). Efficient storage and handling of missing values.
- Series (or Column): A single-typed, named column of data within a DataFrame, providing type-specific operations.
Data Ingestion & Loading
- From
Vec<Vec<T>>/ Iterator: Basic in-memory construction from Rust native collections. - CSV Support: Minimalistic, highly efficient CSV parser for loading data.
- JSON Support: Efficient parsing for common JSON structures.
- Custom Data Sources: Traits/interfaces for users to implement their own data loading mechanisms.
Data Cleaning & Preparation
drop_nulls(): Remove rows with any null values.fill_nulls(value): Fill nulls with a specified value (type-aware, including DateTime).interpolate_nulls(): Basic linear interpolation for numeric and DateTime series.- Type Casting: Efficient conversion between compatible data types for Series (e.g., i32 to f64).
rename_column(old_name, new_name): Rename columns.
Data Transformation & Manipulation
- Selection:
select_columns(names),drop_columns(names). - Filtering: Predicate-based row selection using logical (
AND,OR,NOT) and comparison operators (==,!=,<,>,<=,>=). - Projection:
with_column(new_name, expression),apply()for user-defined functions. - Sorting: Sort DataFrame by one or more columns (ascending/descending).
- Joining: Basic inner, left, and right join operations on common keys.
- Concatenation/Append: Combine DataFrames vertically.
Aggregation & Reduction
- Simple Aggregations:
sum(),mean(),median(),min(),max(),count(),std_dev(). - Group By: Perform aggregations on groups defined by one or more columns.
- Unique Values:
unique()for a Series or DataFrame columns.
Basic Analytics & Statistics
describe(): Provides summary statistics for numeric columns (count, mean, std, min, max, quartiles).correlation(): Calculate Pearson correlation between two numeric Series.covariance(): Calculate covariance.
Output & Export
- To
Vec<Vec<T>>: Export DataFrame content back to standard Rust collections. - To CSV: Efficiently write DataFrame to a CSV file.
- Display/Pretty Print: User-friendly console output for DataFrame and Series.
Installation
Rust
Add the following to your Cargo.toml file:
[]
= "0.2.2" # Or the latest version
Python
You can install the Python bindings using pip after building them with maturin:
# First, build the Python wheel (from the project root)
# Then install the wheel
WebAssembly (Node.js/Browser)
You can install the WebAssembly package using npm after building it with wasm-pack:
# First, build the WebAssembly package (from the project root)
# Then install the package
Usage Examples
Rust Usage
Here's a quick example demonstrating how to create a DataFrame, filter it, and perform a group-by aggregation:
use DataFrame;
use Series;
use ;
use Condition;
use Expr;
use BTreeMap;
Non-Functional Requirements
### Python Usage
```python
import veloxx
# 1. Create a DataFrame
df = veloxx.PyDataFrame({
"name": veloxx.PySeries("name", ["Alice", "Bob", "Charlie", "David"]),
"age": veloxx.PySeries("age", [25, 30, 22, 35]),
"city": veloxx.PySeries("city", ["New York", "London", "New York", "Paris"]),
})
print("Original DataFrame:")
print(df)
# 2. Filter data: age > 25
filtered_df = df.filter([i for i, age in enumerate(df.get_column("age").to_vec_f64()) if age > 25])
print("\nFiltered DataFrame (age > 25):")
print(filtered_df)
# 3. Select columns
selected_df = df.select_columns(["name", "city"])
print("\nSelected Columns (name, city):")
print(selected_df)
# 4. Rename a column
renamed_df = df.rename_column("age", "years")
print("\nRenamed Column (age to years):")
print(renamed_df)
# 5. Series operations
age_series = df.get_column("age")
print(f"\nAge Series Sum: {age_series.sum()}")
print(f"Age Series Mean: {age_series.mean()}")
print(f"Age Series Max: {age_series.max()}")
print(f"Age Series Unique: {age_series.unique().to_vec_f64()}")
WebAssembly Usage (Node.js)
const veloxx = require;
;
Non-Functional Requirements
- **Comprehensive Documentation:** Extensive `///` documentation for all public APIs, examples, and design choices.
- **Robust Testing:** Thorough unit and integration tests covering all functionalities and edge cases.
- **Performance Benchmarking:** Includes benchmarks to track performance and memory usage, ensuring lightweight and high-performance goals are met.
- **Cross-Platform Compatibility:** Designed to work on common operating systems (Linux, macOS, Windows).
- **Safety:** Upholds Rust's safety guarantees, with minimal and heavily justified `unsafe` code.
## Future Considerations / Roadmap
- **Streaming Data:** Support for processing data in a streaming fashion.
- **Time-Series Functionality:** Basic time-series resampling, rolling windows.
- **FFI (Foreign Function Interface):** Consider C API for integration with other languages (Python, JavaScript).
- **Simple Plotting Integration:** Provide hooks or basic data preparation for common plotting libraries.
- **Persistence:** Basic serialization/deserialization formats (e.g., custom binary format, Parquet subset).
## WebAssembly Testing
WebAssembly bindings are currently tested using `console.assert` in `test_wasm.js`. Future work includes migrating to a more robust JavaScript testing framework like Jest.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.