20x faster than pandas with unlimited streaming for large files. ISO 8000/25012 compliant quality metrics, automatic pattern detection (emails, IPs, IBANs, etc.), and comprehensive statistics (mean, median, skewness, kurtosis). Available as CLI, Rust library, or Python package.
🔒 Privacy First: 100% local processing, no telemetry, read-only DB access. See what dataprof analyzes →
Quick Start
CLI Installation
# Install from crates.io
# Or use Python
CLI Usage
# Analyze a file
# Generate HTML report
# Batch process directories
# Database profiling
More options: dataprof-cli --help | Full CLI Guide
Python API
# Quality analysis (ISO 8000/25012 compliant)
=
# Batch processing
=
# Async database profiling
= await
return
Python Documentation | Integrations (Pandas, scikit-learn, Jupyter, Airflow, dbt)
Rust Library
use *;
// Adaptive profiling (recommended)
let profiler = auto;
let report = profiler.analyze_file?;
// Arrow for large files (>100MB, requires --features arrow)
let profiler = columnar;
let report = profiler.analyze_csv_file?;
Development
# Setup
# Test databases (optional)
# Common tasks
Development Guide | Performance Guide
Feature Flags
# Minimal (CSV/JSON only)
# With Apache Arrow (large files >100MB)
# With Parquet support
# With databases
# Python async support
# All features
When to use Arrow: Large files (>100MB), many columns (>20), uniform types When to use Parquet: Analytics, data lakes, Spark/Pandas integration
Documentation
User Guides: CLI Reference | Python API | Python Integrations | Database Connectors | Apache Arrow
Developer: Development Guide | Performance Guide | Benchmarks
Privacy: What DataProf Does - Complete transparency with source verification
🤝 Contributing
We welcome contributions from everyone! Whether you want to:
- Fix a bug 🐛
- Add a feature ✨
- Improve documentation 📚
- Report an issue 📝
Quick Start for Contributors
-
Fork & clone:
-
Build & test:
-
Create a feature branch:
-
Before submitting PR:
-
Submit a Pull Request with clear description
All contributions are welcome. Please read CONTRIBUTING.md for guidelines and our Code of Conduct.
License
MIT License - See LICENSE for details.