CSV Processor
A high-performance Rust library and CLI tool for CSV data analysis, featuring automatic type inference, statistical analysis, and professional reporting capabilities.
๐ฆ Library + CLI Tool
This project provides both:
- ๐ Rust Library - For embedding CSV analysis in your applications
- ๐ฅ๏ธ CLI Tool - For command-line data analysis
Features
- Automatic Type Inference: Intelligently detects integers, floats, booleans, and strings
- Missing Value Analysis: Comprehensive NA/null detection and reporting
- Statistical Operations: Built-in sum, mean, min, max calculations for all numeric types
- JSON Export: Native JSON serialization for DataFrames and columns
- Professional Output: Formatted tables and statistical reports
- Fast Processing: Rust-powered performance for large CSV files
- Self-Analyzing Columns: Each column type implements its own statistical operations
- Comprehensive Testing: 37+ tests ensuring reliability
Installation
As a Library
Add to your Cargo.toml
:
[]
= "0.1.0"
As a CLI Tool
# Or build from source
Usage
๐ Library Usage
use ;
๐ฅ๏ธ CLI Usage
# Check for missing values
# Calculate comprehensive statistics
# Get help
Development Usage:
# When developing/building from source
Sample Output
DataFrame Display
When loading a CSV file, data is displayed in a formatted table:
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโ
โ name โ age โ salary โ department โ active โ score โ ... โ ... โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโค
โ Alice Smith โ 28 โ 75000.5 โEngineering โ true โ 8.7 โ ... โ ... โ
โ Bob Johnson โ null โ 65000 โ Marketing โ false โ null โ ... โ ... โ
โ Carol Davis โ 35 โ null โEngineering โ true โ 9.2 โ ... โ ... โ
โ null โ 29 โ58000.75 โ Sales โ true โ 7.8 โ ... โ ... โ
โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ โฎ โ
โ Henry Taylor โ 38 โ 82000 โEngineering โ false โ 7.5 โ ... โ ... โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโ
10 rows ร 8 columns
Statistical Report (Wide Format)
โโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ column โ mean โ sum โ min โ max โ
โโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโค
โ id โ 5.5 โ 55.0 โ 1.0 โ 10.0 โ
โ age โ 31.29 โ 250.33 โ 26.0 โ 42.0 โ
โ salary โ 72571.5 โ 507000.5 โ 58000.75 โ 95000.0 โ
โ active โ 0.8 โ 8.0 โ 0.0 โ 1.0 โ
โ score โ 8.06 โ 56.4 โ 6.9 โ 9.2 โ
โโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโ
5 rows ร 5 columns
Missing Value Analysis
Column Analysis:
- id: 0 missing values (0.0%)
- name: 2 missing values (20.0%)
- age: 2 missing values (20.0%)
- salary: 3 missing values (30.0%)
- department: 1 missing values (10.0%)
- active: 1 missing values (10.0%)
- start_date: 2 missing values (20.0%)
- score: 3 missing values (30.0%)
API Reference
Core Types
use ;
// Main data container
let df = from_csv?;
// Access columns polymorphically
let column: &dyn ColumnArray = df.get_column.unwrap;
// Statistical operations (all return Option<f64>)
let mean = column.mean;
let sum = column.sum;
let min = column.min;
let max = column.max;
let nulls = column.null_count;
// JSON export
let json_output = df.to_json?;
let column_json = column.to_json;
// Generate reports
let stats_report = generate_info_report;
let na_report = generate_na_report;
Key Traits
ColumnArray
- Unified interface for column data, statistical operations, and JSON exportDisplay
- Formatted output for DataFrames and reports
Architecture
Library + Binary Structure
src/
โโโ lib.rs # Library interface with documentation
โโโ bin/
โ โโโ csv_processor.rs # CLI binary
โโโ series/ # Column-oriented data structures (Polars-style)
โ โโโ array.rs # ColumnArray trait with statistical operations
โโโ frame/ # DataFrame operations and CSV I/O
โ โโโ mod.rs # Main DataFrame implementation
โโโ scalar/ # Cell-level operations and values
โโโ reporter.rs # Statistical report generation
โโโ config.rs # CLI parsing (exported for advanced use)
Core Design Principles
- Library First: Clean API for embedding in applications
- Self-Analyzing Columns: Statistical operations embedded in column types
- Functional Design: Pure functions over object-oriented patterns
- Rust Idioms: Leverage ownership system and proper error handling
Key Data Types
- DataFrame: Main container with typed columns and display formatting
- ColumnArray: Unified trait for data access AND statistical operations
- Column Types:
IntegerColumn
,FloatColumn
,StringColumn
,BooleanColumn
- CellValue: Enum for individual cell values with type information
Development
# Build the project
# Run all tests (37+ test suite)
# Run specific test suite
# Check code quality
# Format code
# Check without building
Performance
- Fast Type Inference: Automatic detection of optimal column types
- Memory Efficient: Column-oriented storage following Apache Arrow patterns
- Zero-Cost Abstractions: Rust's performance with high-level ergonomics
- Parallel Processing Ready: Architecture designed for future parallelization
Examples
Sample CSV Structure
The tool handles various data types and missing values:
id,name,age,salary,department,active,start_date,score
1,Alice Smith,28,75000.50,Engineering,true,2021-03-15,8.7
2,Bob Johnson,,65000,Marketing,false,2020-11-22,
3,Carol Davis,35,NA,Engineering,true,,9.2
Usage Examples
CLI Usage:
# Analyze missing values
# Generate statistical report (includes JSON export demonstration)
# For development (building from source)
Library Usage:
use ;
let df = from_csv?;
let report = generate_info_report;
println!;
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Write tests for your changes
- Run the test suite (
cargo test
) - Ensure code quality (
cargo clippy
) - Commit your changes (
git commit -am 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.