moe-platform 1.3.6

Stable API for MoE-13 ternary inference and model ingestion.
Documentation
# MoE-Platform — Inference Runtime (Planned)

Future production inference API for Albert MoE-13. Not yet implemented — this crate is a placeholder for the deployment-facing interface that will wrap the trained ternary model.

---

## Planned Scope

The platform crate will decouple inference from the training code and provide:

- **Model loading** from `.safetensors` checkpoint + `config.json`
- **Batched inference** with top-k / temperature sampling
- **Ternary-native execution** — apply STE quantization at load time and run integer-only matmuls
- **REST API** via Axum for serving Albert as a local endpoint
- **MCP server integration** — expose Albert as a tool callable from Claude/TernLang-MCP

---

## Current State

Training and inference share the same `moe-llm-core` crate. The `Transformer::generate()` method in `transformer.rs` handles greedy/sampled generation for local testing. This is sufficient for research purposes.

The `moe-platform` crate will be built once the model reaches stable loss convergence and the architecture is frozen for a production release.

---

## Integration Target

```rust
// Future API (not yet implemented)
use moe_platform::Albert;

let albert = Albert::load("models/bible_ternary_v2.0.0")?;
let response = albert.generate("In the beginning", 128)?;
```

---

## See Also

- [Main README]../../README.md — current training setup
- [Architecture]../../docs/architecture.md — model internals
- [TernLang-MCP]../../../ternlang-root/ternlang-mcp/ — MCP server (live)