datacell
A fast, unified CLI tool for spreadsheet and columnar data manipulation.
The Problem
Working with tabular data often requires juggling multiple tools:
- Excel/LibreOffice - GUI-only, slow for batch processing, no scripting
- pandas/Python - Requires Python environment, slow startup, memory-heavy
- csvkit - CSV-only, no Excel/Parquet/Avro support
- xsv - Fast but CSV-only, no formulas
- Apache Spark - Overkill for simple transformations, complex setup
Common pain points:
- Converting between formats requires different tools
- Applying Excel-like formulas to CSV files is awkward
- Batch processing spreadsheets in CI/CD pipelines is difficult
- No single tool handles CSV, Excel, Parquet, and Avro uniformly
The Solution
datacell is a single, fast CLI tool that:
- Reads/writes all major formats: CSV, XLS, XLSX, ODS, Parquet, Avro
- Applies Excel-like formulas to any format (SUM, VLOOKUP, IF, etc.)
- Performs data operations without code (sort, filter, dedupe, transpose)
- Converts between any formats with one command
- Outputs to JSON/Markdown for easy integration
- Runs as an MCP server for AI assistant integration
Why datacell?
| Feature | datacell | pandas | csvkit | xsv | Excel |
|---|---|---|---|---|---|
| Single binary | ✅ | ❌ | ❌ | ✅ | ❌ |
| CSV support | ✅ | ✅ | ✅ | ✅ | ✅ |
| Excel support | ✅ | ✅ | ❌ | ❌ | ✅ |
| Parquet/Avro | ✅ | ✅ | ❌ | ❌ | ❌ |
| Formulas | ✅ | ❌ | ❌ | ❌ | ✅ |
| CLI-native | ✅ | ❌ | ✅ | ✅ | ❌ |
| Fast startup | ✅ | ❌ | ❌ | ✅ | ❌ |
| Scriptable | ✅ | ✅ | ✅ | ✅ | ❌ |
| No dependencies | ✅ | ❌ | ❌ | ✅ | ❌ |
Quick Start
# Install
# Convert CSV to Parquet
# Apply formula
# Filter and sort
# Output as JSON for API consumption
Features
- Read XLS, XLSX, ODS, CSV, Parquet, and Avro files
- Write data to XLS, XLSX, CSV, Parquet, and Avro files
- Convert between any formats (CSV, Excel, ODS, Parquet, Avro)
- Apply formulas to cells in both CSV and Excel files
- Supports basic arithmetic operations (+, -, *, /)
- Supports SUM(), AVERAGE(), MIN(), MAX(), COUNT() functions
- Supports ROUND(), ABS(), LEN() functions
- Supports VLOOKUP(), SUMIF(), COUNTIF() functions
- Supports IF() for conditional logic
- Supports CONCAT() for string concatenation
- Supports cell references (e.g., A1, B2)
- Data operations
- Sort rows by column (ascending/descending)
- Filter rows by condition
- Find and replace values
- Remove duplicate rows
- Transpose data (rows to columns)
- Merge cells (Excel output)
- Pandas-style operations
- Head/tail (first/last n rows)
- Sample random rows
- Select/drop columns
- Describe (summary statistics)
- Value counts
- Group by with aggregations (sum, count, mean, min, max)
- Join/merge files (inner, left, right, outer)
- Concatenate files
- Fill/drop missing values
- Rename columns
- Cell range operations - read/write specific ranges like A1:C10
- Multiple output formats - CSV, JSON, Markdown
- Multi-sheet support - list sheets, read all sheets at once
- Streaming API - process large files efficiently
- Progress callbacks - track long-running operations
- MCP server for integration with AI assistants
Installation
The binary will be available at target/release/datacell.
Usage
Read a file
# Read CSV
# Read Excel (first sheet)
# Read specific sheet
# Read specific cell range
# Read as JSON
# Read range as JSON
# Read as Markdown table
# Read Parquet file
# Read Avro file
Write a file
# Write CSV from CSV
# Write Excel from CSV
# Write Parquet from CSV
# Write Avro from CSV
# Write Excel with specific sheet name
Convert between formats
Supports conversion between: CSV, XLSX, XLS, ODS, Parquet, Avro
# CSV to Excel
# Excel to CSV
# Excel to CSV (specific sheet)
# CSV to Parquet
# Parquet to CSV
# Excel to Avro
# Avro to Parquet
# ODS to CSV
Apply formulas
# Apply SUM formula to CSV
# Apply arithmetic formula
# Apply AVERAGE formula to Excel
Data operations
# Sort by column A (ascending)
# Sort by column B (descending)
# Filter rows where column A > 10
# Filter rows containing text
# Find and replace
# Remove duplicate rows
# Transpose (rows to columns)
# Append data from one file to another
# List sheets in Excel/ODS file
# Read all sheets at once (as JSON)
# Write data to specific cell range
Pandas-style operations
# First/last n rows
# Sample random rows
# Select specific columns
# Describe statistics
# Value counts
# Group by and aggregate
# Join two files
# Concatenate files
# Fill empty values
# Drop rows with empty values
# Drop columns
# Rename columns
Formula Examples
SUM(A1:A10)- Sum of cells A1 through A10AVERAGE(A1:A10)- Average of cells A1 through A10MIN(A1:A10)- Minimum value in rangeMAX(A1:A10)- Maximum value in rangeCOUNT(A1:A10)- Count of numeric cells in rangeROUND(A1, 2)- Round to 2 decimal placesABS(A1)- Absolute valueLEN(A1)- Length of text in cellVLOOKUP(2, A1:C10, 3)- Lookup value in tableSUMIF(A1:A10, ">5", B1:B10)- Sum cells matching criteriaCOUNTIF(A1:A10, ">5")- Count cells matching criteriaIF(A1>10, "High", "Low")- Conditional logicCONCAT(A1, " ", B1)- String concatenationA1+B1- Add values in A1 and B1A1-B1- Subtract B1 from A1A1*B1- Multiply A1 by B1A1/B1- Divide A1 by B1A1- Reference a single cell
Use Cases
Data Pipeline Automation
# Daily ETL: Excel → Parquet for analytics
Report Generation
# Calculate totals and output as Markdown for documentation
Data Cleaning
# Remove duplicates, filter invalid rows, sort
Format Migration
# Migrate legacy Excel files to modern Parquet
for; do
done
AI/LLM Integration
# Start MCP server for AI assistant integration
Example Data
See the examples/ folder for sample data files and usage examples.
Architecture
datacell/
├── src/
│ ├── main.rs # CLI entry point
│ ├── excel.rs # Excel/ODS file handling
│ ├── csv_handler.rs # CSV file handling
│ ├── columnar.rs # Parquet/Avro handling
│ ├── converter.rs # Format conversion
│ ├── formula.rs # Formula evaluation
│ ├── operations.rs # Data operations (sort, filter, etc.)
│ └── mcp.rs # MCP server for AI integration
├── examples/ # Sample data files
└── Cargo.toml
Dependencies
| Crate | Purpose |
|---|---|
clap |
CLI argument parsing |
calamine |
Excel/ODS reading |
rust_xlsxwriter |
Excel writing |
csv |
CSV handling |
parquet + arrow |
Parquet support |
apache-avro |
Avro support |
rmcp |
MCP server |
serde_json |
JSON output |
License
MIT