scatters 0.0.0

A CLI to instantly turn tabular data and audio files into interactive HTML scatter plots.
Documentation
# data_plotter


A Rust CLI that turns tabular data and audio files into interactive, self-contained HTML scatter plots powered by ECharts.

- Inputs: CSV, Parquet, JSON/JSONL/NDJSON, and audio (WAV/MP3/FLAC)
- Output: One HTML file per input, with zoom/pan, legend, and toolbox — no server required
- Directory mode: Pass a folder to process all supported files recursively

## Build and Run


Prerequisites: Rust toolchain with Cargo

- Build (debug):
```powershell
cargo build
```
- Build (release):
```powershell
cargo build --release
```
- Show CLI help:
```powershell
cargo run -- --help
```

## Usage


Basic examples:
```powershell
# Plot all numeric columns; auto-detect X when possible

cargo run -- sample/sample.csv

# Write outputs to a directory

cargo run -- sample/sample.csv -o plots/

# Set the X-axis explicitly

cargo run -- data.csv --index timestamp

# Choose specific Y columns

cargo run -- data.csv -c sensor_a,sensor_b

# Use first column as X and set a custom title

cargo run -- data.csv --use-first-column --title "My Custom Plot"

# Process an entire directory recursively

cargo run -- path/to/folder -o plots/

# Audio files (mono or multi-channel)

cargo run -- audio.wav

# Disable dynamic Y autoscaling (keep initial padded range)

cargo run -- sample/sample.csv --no-autoscale-y

```

Where outputs go:
- With `-o/--output-dir`, files are saved under that directory as `<stem>.html`.
- Without it, each plot is saved next to its input file.

## CLI

```
A tool to generate interactive scatter plots from various data formats.

Usage: data_plotter.exe [OPTIONS] <INPUT_PATH>

Arguments:
  <INPUT_PATH>  The input file or folder to scan for data

Options:
  -o, --output-dir <OUTPUT_DIR>  Directory to save the generated HTML plots. Defaults to saving next to each input file
      --index <INDEX>            Name of the column to use as the index (X-axis). Highest priority for index selection
      --use-first-column         Use the first column of the data as the index. Overridden by --index
  -c, --columns <COLUMNS>        Comma-separated list of columns to plot (Y-axis). If not provided, all numeric columns will be plotted
      --title <TITLE>            A custom title for the plot. Defaults to the input filename
      --no-autoscale-y           Disable dynamic Y-axis autoscaling on zoom (keeps initial Y range)
  -h, --help                     Print help
  -V, --version                  Print version
```

## How X and Y are chosen


X-axis selection priority:
1) `--index <name>`
2) `--use-first-column`
3) If a column named `sample_index` exists (audio), use it
4) First datetime column (string columns may be auto-cast to datetime)
5) Fallback to a generated `row_index`

Y-axis selection:
- If `-c/--columns` is provided, those columns are used
- Otherwise, all numeric columns except the chosen X column

## Supported formats and behavior


- CSV, Parquet
- JSON, JSONL/NDJSON (JSON is read as JSON Lines; for array-of-objects JSON, convert to NDJSON)
- Audio: WAV/MP3/FLAC via Symphonia
  - The first default track is decoded
  - DataFrame has `sample_index` (X) and `amplitude` (Y)
  - For multi-channel audio, samples are currently interleaved into the single `amplitude` series

Notes on parsing and type inference:
- String columns that look like datetimes are only cast to datetime if at least one value parses successfully. This avoids selecting an "all-null" datetime axis and producing empty plots.
- String columns that look numeric are trimmed and parsed into Float64 if at least ~50% of their values parse successfully. This helps for CSVs that don't have headers and/or have numeric values with whitespace.

## Output HTML

- Self-contained HTML with ECharts loaded from CDN
- Interactive features: zoom/pan (dataZoom), legend scroll, save-as-image, restore
- X-axis type is set automatically (time, category, or value) based on the chosen X series
- Themes: dark (default) and white (enable with `--white-theme`)
- Numeric formatting: `-m/--max-decimals` controls decimals; `-1` disables the limit; scientific notation when appropriate

## Project structure (brief)


- `src/cli.rs` — command-line interface (clap)
- `src/lib.rs` — orchestration (discover files, process each, write output)
- `src/data_loader.rs` — load DataFrames; audio decoding; best-effort datetime casting
- `src/processing.rs` — select X/Y series and plot title
- `src/plotter.rs` — generate HTML and embed series as JSON for ECharts
- `src/error.rs` — error types

For additional repository-specific guidance, see `WARP.md`.