rust_trainer 0.1.4

# Rust Trainer Roadmap

This roadmap tracks the path from current production-candidate status to full production-grade training.

## Current baseline

Completed:

- end-to-end CPU training loop binary
- serializable AdamW optimizer state
- resume-safe checkpoints
- JSONL metrics logging
- expansion and freeze policies
- deterministic parity probe for save/load equivalence
- SIMD kernel parity checks
- validation loop with configurable eval cadence
- best-checkpoint tracking and early stopping
- gradient clipping + LR warmup/cosine scheduling controls
- non-finite guardrails for train-step updates
- streaming shard data loader with resumable cursor state
- atomic versioned generic trainer checkpoints

## Phase 1: Training correctness hardening

Goals:

- add cross-framework parity checks against Python/JAX on deterministic shared batches
- verify one-step parity for loss and gradient norms with strict tolerances
- run deterministic replay tests across resume boundaries

Acceptance criteria:

- parity report generated in CI and for local release checks
- parity drift bounds documented and enforced

## Phase 2: Data pipeline and scalability

Goals:

- add streaming token reader with shard support
- add shuffled packed-sequence batching
- avoid full-file memory loading by default

Acceptance criteria:

- training can run on datasets larger than RAM
- throughput and memory use are stable across long runs

## Phase 3: Stability and safety controls

Goals:

- add gradient clipping
- add LR schedules (warmup + cosine/linear decay)
- add non-finite gradient/loss guards and fail-safe checkpointing
- add atomic checkpoint writes and checkpoint schema versioning

Acceptance criteria:

- long-run smoke tests complete without NaN divergence
- restart from checkpoints is robust after forced interruptions

## Phase 4: Production observability

Goals:

- add validation loop and eval cadence
- add best-checkpoint tracking and early stopping hooks
- extend metrics with validation loss, update norms, and stability indicators

Acceptance criteria:

- training quality can be monitored and compared across runs
- operational dashboards can consume metrics.jsonl without custom parsing

## Phase 5: Packaging and ecosystem integration

Goals:

- provide stable public APIs for embedders
- add optional bindings/integration layers where needed
- add docs for multi-environment deployment and reproducibility

Acceptance criteria:

- crate is consumable as a reusable generic trainer package
- release process includes compatibility and migration notes

## Near-term implementation order

1. Cross-framework parity harness
2. Packed sequence batching for shard streams
3. Multi-worker/parallel data ingest for throughput scaling
4. Cross-platform CI matrix expansion for runtime validation
5. Optional PyO3 integration layer for hybrid orchestration