data-doctor-cli 1.0.4

A powerful data validation and cleaning tool for JSON and CSV files
# DataDoctor CLI 🩺


[![Crates.io](https://img.shields.io/crates/v/data-doctor-cli.svg)](https://crates.io/crates/data-doctor-cli)
[![Downloads](https://img.shields.io/crates/d/data-doctor-cli.svg)](https://crates.io/crates/data-doctor-cli)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**DataDoctor CLI** is your command-line companion for maintaining data health. It brings the power of the DataDoctor engine directly to your terminal, allowing you to validate, analyze, and repair JSON and CSV files instantly.

---

## 🚀 Installation


### Option 1: Install from Crates.io (Recommended)


If you have Rust installed, this is the easiest way:

```bash
cargo install data-doctor-cli
```

*This installs the `data-doctor` binary to your path.*

### Option 2: Build from Source


```bash
git clone https://github.com/jeevanms003/data-doctor.git
cd data-doctor
cargo install --path cli
```

---

## 🎮 How It Works


DataDoctor provides three primary modes of operation, designed for different workflows:

### 1. `validate` (The Checkup)

**Best for:** CI/CD pipelines, pre-commit hooks, or just checking file integrity.

This command scans your file and reports issues without modifying anything. It returns a non-zero exit code if errors are found, making it perfect for automated scripts.

```bash
data-doctor validate users.csv
```

### 2. `fix` (The Surgery)

**Best for:** Cleaning messy data dumps, fixing "broken" JSON from APIs.

This command actively repairs the file and saves the clean version to a new output path. It applies all available auto-fix strategies (e.g., adding missing quotes, padding columns).

```bash
data-doctor fix broken_data.json --out clean_data.json
```

### 3. `doctor` (The Full Treatment)

**Best for:** Interactive analysis and reporting.

This runs a validation pass, then an auto-fix pass, and generates a comprehensive report comparing the "before" and "after" states.

```bash
data-doctor doctor input.csv --out fixed.csv
```

---

## 📋 Command Reference


### `validate`


```bash
data-doctor validate <INPUT> [OPTIONS]
```

**Options:**
- `--format <json|csv>`: Force a specific file format (overrides extension detection).
- `--report-json`: Print a machine-readable JSON object instead of the human-readable report.
- `--schema <FILE>`: Validate against a custom schema definition.

### `fix`


```bash
data-doctor fix <INPUT> --out <OUTPUT> [OPTIONS]
```

**Options:**
- `--out <FILE>`: (Required) Where to save the fixed file.
- `--format <json|csv>`: Force specific file format.

### `doctor`


```bash
data-doctor doctor <INPUT> --out <OUTPUT> [OPTIONS]
```

Combines `validate` and `fix` functionalities with detailed logging.

---

## 🔍 What Can It Fix?


### JSON Fixes (Advanced)

| Issue | Example (Before) | Example (After) |
|-------|------------------|-----------------|
| **Broken Structure** | `[ { "a": 1 } }` | `[ { "a": 1 } ]` (Mismatched bracket fix) |
| **Embedded Keys** | `"desc": "val,"key": "v"` | `"desc": "val", "key": "v"` |
| **Numeric Formats** | `{"val": 0xFF, "oct": 0o77}` | `{"val": 255, "oct": 63}` |
| **Invalid Booleans** | `{"active": yes}` | `{"active": true}` |
| **Leading Zeros** | `{"id": 030}` | `{"id": 30}` |
| **Trailing Commas** | `{"a": 1,}` | `{"a": 1}` |
| **Missing Commas** | `{"a": 1 "b": 2}` | `{"a": 1, "b": 2}` |
| **Unquoted Keys** | `{name: "John"}` | `{"name": "John"}` |
| **Single Quotes** | `{'name': 'John'}` | `{"name": "John"}` |
| **Unclosed Brackets** | `[1, 2, 3` | `[1, 2, 3]` |

### CSV Fixes

| Issue | Before | After |
|-------|--------|-------|
| **Padding Columns** | `A,B,C`<br>`1,2` | `A,B,C`<br>`1,2,` (Empty added) |
| **Trimming Cols** | `A,B`<br>`1,2,3,4` | `A,B`<br>`1,2` (Extras removed) |
| **Booleans** | `Yes, No` | `true, false` |
| **Whitespace** | `  Value  ` | `Value` |

---

## 📊 JSON Reports


For integration with other tools (like dashboards), use `--report-json`.

**Command:**
```bash
data-doctor validate data.csv --report-json
```

**Output:**
```json
{
  "success": false,
  "total_records": 100,
  "invalid_records": 5,
  "issues": [
    {
      "severity": "Error",
      "code": "CSV_TYPE_MISMATCH",
      "message": "Invalid Integer value",
      "row": 42,
      "column": 2
    }
  ]
}
```

---

## 📄 License


This project is licensed under the MIT License.