MoE-Platform — Inference Runtime (Planned)

Future production inference API for Albert MoE-13. Not yet implemented — this crate is a placeholder for the deployment-facing interface that will wrap the trained ternary model.

Planned Scope

The platform crate will decouple inference from the training code and provide:

Model loading from .safetensors checkpoint + config.json
Batched inference with top-k / temperature sampling
Ternary-native execution — apply STE quantization at load time and run integer-only matmuls
REST API via Axum for serving Albert as a local endpoint
MCP server integration — expose Albert as a tool callable from Claude/TernLang-MCP

Current State

Training and inference share the same moe-llm-core crate. The Transformer::generate() method in transformer.rs handles greedy/sampled generation for local testing. This is sufficient for research purposes.

The moe-platform crate will be built once the model reaches stable loss convergence and the architecture is frozen for a production release.

Integration Target

// Future API (not yet implemented)
use moe_platform::Albert;

let albert = Albert::load("models/bible_ternary_v2.0.0")?;
let response = albert.generate("In the beginning", 128)?;

moe-platform 1.3.6

MoE-Platform — Inference Runtime (Planned)

Planned Scope

Current State

Integration Target

See Also