rustsight 1.1.1

A fast, safe CLI tool for dataset analysis and validation. Analyzes CSV files for column types, missing values, basic statistics (min/max/mean), outliers, no-variance columns, and mixed-type columns — helping you catch data quality issues before ML/AI training. Also supports binary and text file introspection.
rustsight-1.1.1 is not a library.

Dataset Analyzer (Rust)

RustSight is a fast, safe, and extensible dataset analysis CLI tool written in Rust.
This project focuses on data validation and exploratory analysis — the exact step that comes before AI/ML model training.

It works on any CSV file and can also analyze binary or text files to extract useful properties.


✨ Features

CSV Dataset Analysis

  • Detects numeric vs categorical columns
  • Counts missing values per column
  • Computes basic statistics (min, max, mean) for numeric columns
  • Handles large CSV files efficiently (streaming)
  • Generates a clean, readable analysis report
  • Saves results to a _report.txt file

Data Validation (NEW)

  • Detects columns with high missing value ratios
  • Flags no-variance columns (min == max)
  • Detects potential outliers
  • Identifies mixed-type columns
  • Prints clear validation warnings before ML usage

File Analysis

  • Counts total bytes
  • Detects UTF-8 validity
  • Counts lines and words (if text)
  • Counts non-ASCII bytes (for binaries)

📂 Example Datasets

Used during development (not required):

  • stockdata.csv — financial dataset
  • CVD Dataset.csv — cardiovascular health dataset

⚠ Large datasets are not bundled.
You can analyze any CSV file.


🚀 How to Run

1️⃣ Analyze a CSV dataset

cargo run -- csv stockdata.csv

cargo run -- csv "CVD Dataset.csv"

This will:

  • Print column-wise analysis
  • Save a report like: stockdata_report.txt

2️⃣ Validate a dataset (NEW)

cargo run -- validate insta_data.csv

Example output:

File: insta_data.csv
⚠ Column 'followers_count' may contain outliers
⚠ Column 'user_engagement_score' may contain outliers

This helps detect data quality issues before ML training.


3️⃣ Analyze any file (text or binary)

cargo run -- analyze stockdata.csv

cargo run -- analyze target\debug\dataset_analyzer.exe

This detects:

  • Total bytes
  • UTF-8 validity
  • Line & word counts (if text)
  • Non-ASCII bytes (if binary)

📦 Installation (From Source)

git clone https://github.com/omarnahdi/Dataset-Analyzer.git

cd dataset-analyzer


cargo build --release

Run using:

./target/release/dataset_analyzer csv your_file.csv


🪟 Using the Windows .exe (Recommended)

  1. Go to the Releases section on GitHub
  2. Download: dataset_analyzer.exe
  3. Run from terminal:
dataset_analyzer.exe csv your_file.csv

dataset_analyzer.exe validate your_file.csv

No Rust installation required.


🛠️ Tech Stack

  • Rust — performance, memory safety
  • csv crate — efficient CSV parsing
  • CLI first design — easy automation & scripting

📝 License

MIT License


🤝 Contributing

Contributions are welcome!
Feel free to open issues or submit pull requests.

Portfolio: https://omarnahdi.dev RustSight: https://omarnahdi.dev/work/dataset-analyzer