converge-analytics 0.1.2

# Converge Analytics Plan

This document is a checkpoint of what is implemented today versus what is mocked or placeholder logic. It is meant to be revisited and updated as the stack matures.

## Implemented (Real Behavior)

- Dataset download for training flow from the HuggingFace parquet shard `train/0000.parquet`.
- Dataset splitting into `train.parquet`, `val.parquet`, and `infer.parquet` based on a `TrainingPlan`.
- Feature extraction in `FeatureAgent` (Polars) that reads from Parquet/CSV and constructs a small feature vector.
- Baseline training that computes the mean of a numeric target column and persists a JSON model.
- Evaluation that computes MAE and a derived success ratio on the validation split.
- Sample inference that compares predictions vs. actuals on a small inference split.
- Basic convergence loop for both `agent_loop` and `training_flow`.

## Implemented (Minimal / Simplified)

- Feature extraction uses the first two numeric columns or explicitly configured columns, then uses the first row only.
- Model is a baseline mean predictor, not a learned model with features.
- Evaluation metrics are limited to MAE and a normalized "success ratio."
- Data validation currently produces missingness, simple numeric means, outlier counts (3-sigma), and a lightweight drift score.
- Feature engineering produces a spec (numeric/categorical lists, standardization, and a single interaction) but does not apply it.
- Hyperparameter search produces a plan and a fabricated "best" result without training models.
- Model registry writes a record derived from the latest evaluation only; no artifact store or versioning system exists.
- Monitoring emits a simple status based on evaluation success ratio.
- Deployment decision is a rule-based gate on the quality threshold.

## Mock / Placeholder

- Hyperparameter search scoring is heuristic and does not run trials.
- Feature specs are not used by training or inference.
- Drift detection is a simple mean-delta; there is no statistical test or time-based windowing.
- Model registry is just a serialized record without actual artifact hosting, lineage graph, or signatures.
- Monitoring and deployment logic are not connected to any serving system.
- Inference model in `agent_loop` is a randomly initialized Burn MLP (no learned weights).

## Gaps to Close (Suggested Order)

1. Wire feature specs into training and inference (apply normalization, categorical encoding, interactions).
2. Replace baseline mean model with a learned model using the engineered features.
3. Implement real hyperparameter search tied to training/evaluation.
4. Add persistent model registry storage and evaluation artifacts.
5. Improve monitoring with drift, bias checks, and live performance tracking.
6. Add deployment orchestration and retraining triggers integrated with monitoring.

## Quick References

- `examples/agent_loop.rs` — feature extraction + inference demo.
- `examples/training_flow.rs` — training pipeline with agents.
- `src/engine.rs` — feature extraction agent.
- `src/model.rs` — Burn inference agent (demo).
- `src/training.rs` — dataset, training, evaluation, and pipeline agents.