1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
RustSight is a fast, safe, and extensible **dataset analysis CLI tool written in Rust**.
This project focuses on **data validation and exploratory analysis** — the exact step that comes *before* AI/ML model training.
It works on **any CSV file** and can also analyze **binary or text files** to extract useful properties.
---
- -----
- ----
- ---
---
Used during development (not required):
- -
---
```bash
cargo run -- csv stockdata.csv
cargo run -- csv "CVD Dataset.csv"
```
This will:
- -
---
```bash
cargo run -- validate insta_data.csv
```
Example output:
```
File: insta_data.csv
⚠ Column 'followers_count' may contain outliers
⚠ Column 'user_engagement_score' may contain outliers
```
This helps detect **data quality issues before ML training**.
---
```bash
cargo run -- analyze stockdata.csv
cargo run -- analyze target\debug\dataset_analyzer.exe
```
This detects:
- ---
---
```bash
git clone https://github.com/omarnahdi/Dataset-Analyzer.git
cd dataset-analyzer
cargo build --release
```
Run using:
```bash
./target/release/dataset_analyzer csv your_file.csv
```
---
1. 2.3.
```bash
dataset_analyzer.exe csv your_file.csv
dataset_analyzer.exe validate your_file.csv
```
No Rust installation required.
---
- --
---
MIT License
---
Contributions are welcome!
Feel free to open issues or submit pull requests.
Portfolio: https://omarnahdi.dev
RustSight: https://omarnahdi.dev/work/dataset-analyzer