# TransXLab
**The training architect: validate and design LLM fine-tuning configs before you spend a dollar on GPU time.**
3.3 MB single binary. Zero Python dependencies. Catches the mistakes that cost you $665 and a weekend.
---
## Install
```
cargo install transxlab
```
## Quick Start
```bash
# Full pipeline: preflight + design + data strategy
transxlab full --config run.yaml
# Validate environment and config only
transxlab setup --config run.yaml
# Design architecture from a HuggingFace model
transxlab design --inputs design.yaml --hub-id meta-llama/Llama-3-8B
# Postmortem on a failed run
transxlab diagnose --log training.log
```
## What It Does
TransXLab runs a three-level pipeline over your training config:
| 1 | **Preflight** | Bad env, missing files, GPU/VRAM mismatches, config errors |
| 2 | **Design** | Wrong architecture choices, unsafe hyperparameters, cost blowouts |
| 3 | **Data Strategy** | Quality gaps, contamination risk, diversity issues |
Under the hood: **20 failure-mode signatures**, **25 hyperparameter rules**, cloud cost estimation across 7 GPU tiers and 4 providers (RunPod, Lambda, AWS, Vast.ai).
**HuggingFace Hub integration** -- pass a model ID and TransXLab auto-detects architecture, parameter count, and recommended PEFT config.
**Config generation** for HF Trainer, Axolotl, LLaMA-Factory, and PEFT -- validated configs you can hand straight to your training framework.
**CI/CD gating** with `--fail-on warn|fail` and `--json` for machine-readable output.
## Example: Catching the AC-v2 Disaster
The config that started this project -- a Flan-T5-XL creative-generation run that burned $665 before anyone noticed the problems:
```
$ transxlab full --config examples/ac_v2_config.yaml
TransXLab v0.1.0 -- Full Pipeline
==================================
[PREFLIGHT] 6 checks passed, 0 warnings, 0 failures
[DESIGN]
FAIL lr=1e-4 exceeds safe threshold for full fine-tuning (max 5e-5)
WARN full fine-tuning on 3B params -- consider LoRA/QLoRA to cut VRAM 60-80%
WARN no diversity loss for creative-generation task
[DATA STRATEGY]
WARN diversity_loss_weight=0.0 with task_type=creative generation
--> high mode-collapse risk
[COST]
Estimated: $142-$310 across providers (A100 80GB recommended)
Result: 2 failures, 2 warnings -- BLOCKED
```
Every issue flagged here went undetected in the real AC-v2 run. TransXLab exists so it doesn't happen again.
## Companion: TransXform
TransXLab validates **before** training. [TransXform](https://github.com/zamfir70/TransXform) supervises **during** training -- detecting loss anomalies, checkpoint corruption, and resource exhaustion in real time.
## License
MIT