moeflux 0.1.0-pre.3

Pure-Rust streaming-experts MoE inference on Metal. Forked from flash-moe; only the Metal kernels remain from upstream.
# moeflux

Pure-Rust streaming-experts Mixture-of-Experts decode for Apple
Silicon.

moeflux began as a fork of
[danveloper/flash-moe](https://github.com/danveloper/flash-moe) and
has since been rewritten in Rust to the point of being a distinct
codebase. The host-side inference engine is new Rust on `metal-rs`;
the original C/Objective-C functions served as differential oracles
during the rewrite, not as a line-by-line translation source. The
Metal streaming-experts kernels were authored by **Claude Opus 4.6**
(Anthropic) for flash-moe and carry over here. The math is the same
linear algebra every inference engine runs — nothing here is claimed
as novel.

## What's here

- `crates/moeflux/` — the Rust engine. `RsCtx::open` opens a model;
  `eval_prompt` / `eval_token` / `state_save` / `state_load` are the
  public surface. Kernels at `crates/moeflux/shaders/shaders.metal`
  are embedded via `include_str!` and compiled at runtime.
- `scripts/` — the model-prep pipeline (`extract_weights.py`,
  `export_vocab.py`, `export_tokenizer.py`). One-time per target
  model, not runtime; likely future Rust binaries.
- `tools/mlx_reference/` — an MLX-based reference diff harness;
  `crates/moeflux/tests/mlx_regression.rs` regenerates its golden
  fixtures from it.

## Status

**Pre-alpha**, pre-`0.1`. The Rust engine is the only path. The API
will stabilize once runtime model-variant dispatch lands.

## License

MIT — see [`LICENSE`](LICENSE). See also [`CONTRIBUTORS.md`](CONTRIBUTORS.md).

## Acknowledgements

- **@danveloper** — for building the thing the hard way, writing it
  up, and publishing everything openly. moeflux started as flash-moe
  and git history reflects that.
- **Claude Opus 4.6** — for the Metal streaming-experts kernels and
  the architecture that made all of this run.
- **Anthropic** — for making Claude available to do work like this in
  the first place.