# CLI Usage Guide
Complete reference for the `dataprof` command-line interface (v0.6.0).
## Installation
```bash
cargo install dataprof # default features
cargo install dataprof --features full-cli # includes Parquet + all databases
```
## Subcommands
### `analyze` -- Full profiling with quality metrics
Runs comprehensive ISO 8000/25012 quality analysis on a file.
```
dataprof analyze <FILE> [OPTIONS]
```
| `--detailed` | Show detailed metrics for all quality dimensions |
| `--format <FMT>` | Output format: `text` (default), `json`, `csv`, `plain` |
| `-o, --output <PATH>` | Save output to file |
| `--threshold-profile <PROFILE>` | ISO quality threshold: `default`, `strict`, `lenient` |
| `--sample <SIZE>` | Sample first N rows (useful for large files) |
| `--progress` | Show real-time progress bar |
| `--chunk-size <SIZE>` | Custom chunk size for streaming (e.g. `10000`) |
| `--config <PATH>` | Load TOML configuration file |
| `--metrics <PACK>` | Metric packs: `schema`, `statistics`, `patterns`, `quality` |
| `-v, -vv, -vvv` | Increase verbosity (info, debug, trace) |
**Examples:**
```bash
# Basic analysis
dataprof analyze data.csv
# Detailed quality report
dataprof analyze data.csv --detailed
# JSON output for CI pipelines
dataprof analyze data.csv --format json -o report.json
# Strict quality thresholds
dataprof analyze data.csv --detailed --threshold-profile strict
# Large file with sampling and progress
dataprof analyze huge.csv --sample 100000 --progress --chunk-size 5000
```
**Sample text output:**
```
File: data.csv
Size: 15.3 MB
Columns: 8
Rows: 50000
Scan time: 342 ms
Column Profiles
customer_id (Integer)
Records: 50000 | Nulls: 0 (0.0%)
Min: 1.00 | Max: 10000.00 | Mean: 5000.50
Std Dev: 2886.90 | Median: 5000.00
email (String)
Records: 50000 | Nulls: 234 (0.5%)
Min Length: 8 | Max Length: 45 | Avg Length: 22.3
Patterns: email (98.2%)
Quality Assessment (ISO 8000/25012)
Completeness: 95.2%
Consistency: 98.7%
Uniqueness: 99.1%
Accuracy: 96.5%
Timeliness: 100.0%
Overall Score: 97.9%
```
### `schema` -- Fast schema inference
Infers column names and data types by reading a small sample of rows. For Parquet files, reads metadata only (zero row scanning).
```
dataprof schema <FILE> [OPTIONS]
```
| `--format <FMT>` | Output format: `text` (default), `json` |
| `-o, --output <PATH>` | Save output to file |
**Examples:**
```bash
dataprof schema data.csv
dataprof schema data.parquet --format json
```
**Sample output:**
```
Schema for data.csv (100 rows sampled, 12ms):
customer_id Integer
name String
email String
age Integer
salary Float
signup_date Date
```
### `count` -- Quick row count
Returns exact counts for small files and Parquet (via metadata), estimates for large files.
```
dataprof count <FILE> [OPTIONS]
```
| `--format <FMT>` | Output format: `text` (default), `json` |
**Examples:**
```bash
dataprof count data.csv
dataprof count data.parquet
```
**Sample output:**
```
Row count: 50000 (exact, full scan, 45ms)
```
### `database` -- Database profiling
> Requires compilation with database feature flags (`--features postgres`, `--features mysql`, `--features sqlite`, or `--features all-db`).
Profile data directly from a database connection.
```
dataprof database <CONNECTION_STRING> [OPTIONS]
```
| `--table <NAME>` | Table name to profile (generates `SELECT * FROM <table>`) |
| `--query <SQL>` | Custom SQL query to profile |
| `--batch-size <N>` | Streaming batch size (default: `10000`) |
| `--quality` | Compute ISO 8000/25012 quality metrics |
You must provide either `--table` or `--query`.
**Connection string formats:**
```
postgres://user:password@host:5432/database
mysql://user:password@host:3306/database
sqlite:///path/to/database.db
```
**Examples:**
```bash
# Profile a PostgreSQL table
dataprof database postgres://user:pass@localhost/mydb --table users --quality
# Custom query on MySQL
dataprof database mysql://root:pass@localhost/shop --query "SELECT * FROM orders WHERE date > '2024-01-01'"
# SQLite
dataprof database sqlite:///data.db --table events
```
## Output Formats
All subcommands that support `--format` accept these values:
| `text` | Human-readable with colors | Terminal, interactive use |
| `json` | Machine-readable JSON | CI/CD, pipelines, APIs |
| `csv` | Comma-separated values | Spreadsheets, data pipelines |
| `plain` | Text without ANSI colors | Piping, file redirection |
The CLI automatically adapts output when it detects a non-interactive context (piped output, file redirection).
## Configuration File
dataprof loads settings from a TOML configuration file. Use `--config <PATH>` to specify one explicitly, or place a `.dataprof.toml` in your working directory.
```toml
[output]
format = "json"
colored = true
show_progress = true
verbosity = 1
[quality]
enabled = true
[engine]
# Engine settings are applied automatically
```
**Auto-discovery paths** (checked in order):
1. `.dataprof.toml` in current directory
2. `.dataprof.toml` in parent directories (up to root)
3. `.dataprof.toml` in home directory
## Supported File Formats
| CSV | `.csv`, `.tsv`, `.txt` | Auto-detects delimiter (`,` `;` `\|` `\t`) |
| JSON | `.json` | Array of objects |
| JSONL | `.jsonl` | Newline-delimited JSON (one object per line) |
| Parquet | `.parquet` | Schema/count read from metadata without scanning rows |
## Environment Variables
| `NO_COLOR=1` | Disable colored output |
| `RUST_LOG=dataprof=debug` | Enable debug logging via env_logger |
## Getting Help
```bash
dataprof --help
dataprof analyze --help
dataprof schema --help
dataprof count --help
dataprof database --help # if compiled with database features
```