rustsight-1.1.2 is not a library.
Dataset Analyzer (Rust)
RustSight is a fast, safe, and extensible dataset analysis CLI tool written in Rust.
This project focuses on data validation and exploratory analysis — the exact step that comes before AI/ML model training.
It works on any CSV file and can also analyze binary or text files to extract useful properties.
✨ Features
CSV Dataset Analysis
- Detects numeric vs categorical columns
- Counts missing values per column
- Computes basic statistics (min, max, mean) for numeric columns
- Handles large CSV files efficiently (streaming)
- Generates a clean, readable analysis report
- Saves results to a
_report.txtfile
Data Validation (NEW)
- Detects columns with high missing value ratios
- Flags no-variance columns (min == max)
- Detects potential outliers
- Identifies mixed-type columns
- Prints clear validation warnings before ML usage
File Analysis
- Counts total bytes
- Detects UTF-8 validity
- Counts lines and words (if text)
- Counts non-ASCII bytes (for binaries)
📂 Example Datasets
Used during development (not required):
stockdata.csv— financial datasetCVD Dataset.csv— cardiovascular health dataset
⚠ Large datasets are not bundled.
You can analyze any CSV file.
🚀 How to Run
1️⃣ Analyze a CSV dataset
This will:
- Print column-wise analysis
- Save a report like:
stockdata_report.txt
2️⃣ Validate a dataset (NEW)
Example output:
File: insta_data.csv
⚠ Column 'followers_count' may contain outliers
⚠ Column 'user_engagement_score' may contain outliers
This helps detect data quality issues before ML training.
3️⃣ Analyze any file (text or binary)
This detects:
- Total bytes
- UTF-8 validity
- Line & word counts (if text)
- Non-ASCII bytes (if binary)
📦 Installation (From Source)
Run using:
🪟 Using the Windows .exe (Recommended)
- Go to the Releases section on GitHub
- Download:
dataset_analyzer.exe - Run from terminal:
No Rust installation required.
🛠️ Tech Stack
- Rust — performance, memory safety
- csv crate — efficient CSV parsing
- CLI first design — easy automation & scripting
📝 License
MIT License
🤝 Contributing
Contributions are welcome!
Feel free to open issues or submit pull requests.
Portfolio: https://omarnahdi.dev RustSight: https://omarnahdi.dev/work/dataset-analyzer