Profile 50GB datasets in seconds on your laptop.
DataProf is built for Data Scientists and Engineers who need to understand their data fast. No more MemoryError when trying to profile a CSV larger than your RAM.
Pandas-Profiling vs DataProf on a 10GB CSV:
| Feature | Pandas-Profiling / YData | DataProf |
|---|---|---|
| Memory Usage | 12GB+ (Crashes) | < 100MB (Streaming) |
| Speed | 15+ minutes | 45 seconds |
| Implementation | Python (Slow) | Rust (Blazing Fast) |
🔒 Privacy First: 100% local processing, no telemetry. See what dataprof analyzes →
🚀 Quick Start
Installation
The easiest way to get started is via pip:
Python Usage
Forget complex configurations. Just point to your file:
# Analyze a huge file without crashing memory
# Generates a report.html with quality metrics and distributions
CLI & Rust Usage (Advanced)
If you prefer the command line or are a Rust developer:
# Install via cargo
# Generate report from CLI
More options: dataprof-cli --help | Full CLI Guide
💡 Key Features
- No Size Limits: Profiles files larger than RAM using streaming and memory mapping.
- Blazing Fast: Written in Rust with SIMD acceleration.
- Privacy Guaranteed: Data never leaves your machine.
- Format Support: CSV, Parquet, JSON/L, and Databases (Postgres, MySQL, etc.).
- Smart Detection: Automatically identifies Emails, IPs, IBANs, Credit Cards, and more.
📊 Beautiful Reports
Documentation
Advanced Examples
Batch Processing (Python)
# Process a whole directory of files in parallel
=
Database Integration (Python)
# Profile a SQL query directly
await
Rust Library Usage
use *;
let profiler = auto;
let report = profiler.analyze_file?;
println!;
Development
# Setup
# Test databases (optional)
# Common tasks
Development Guide | Performance Guide
Feature Flags
# Minimal (CSV/JSON only)
# With Apache Arrow (large files >100MB)
# With Parquet support
# With databases
# Python async support
# All features
When to use Arrow: Large files (>100MB), many columns (>20), uniform types When to use Parquet: Analytics, data lakes, Spark/Pandas integration
Documentation
User Guides: CLI Reference | Python API | Python Integrations | Database Connectors | Apache Arrow
Developer: Development Guide | Performance Guide | Benchmarks
Privacy: What DataProf Does - Complete transparency with source verification
🤝 Contributing
We welcome contributions from everyone! Whether you want to:
- Fix a bug 🐛
- Add a feature ✨
- Improve documentation 📚
- Report an issue 📝
Quick Start for Contributors
-
Fork & clone:
-
Build & test:
-
Create a feature branch:
-
Before submitting PR:
-
Submit a Pull Request with clear description
All contributions are welcome. Please read CONTRIBUTING.md for guidelines and our Code of Conduct.
License
MIT License - See LICENSE for details.