Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
moeflux
Pure-Rust streaming-experts Mixture-of-Experts decode for Apple Silicon.
moeflux began as a fork of
danveloper/flash-moe and
has since been rewritten in Rust to the point of being a distinct
codebase. The host-side inference engine is new Rust on metal-rs;
the original C/Objective-C functions served as differential oracles
during the rewrite, not as a line-by-line translation source. The
Metal streaming-experts kernels were authored by Claude Opus 4.6
(Anthropic) for flash-moe and carry over here. The math is the same
linear algebra every inference engine runs — nothing here is claimed
as novel.
What's here
crates/moeflux/— the Rust engine.RsCtx::openopens a model;eval_prompt/eval_token/state_save/state_loadare the public surface. Kernels atcrates/moeflux/shaders/shaders.metalare embedded viainclude_str!and compiled at runtime.scripts/— the model-prep pipeline (extract_weights.py,export_vocab.py,export_tokenizer.py). One-time per target model, not runtime; likely future Rust binaries.tools/mlx_reference/— an MLX-based reference diff harness;crates/moeflux/tests/mlx_regression.rsregenerates its golden fixtures from it.
Status
Pre-alpha, pre-0.1. The Rust engine is the only path. The API
will stabilize once runtime model-variant dispatch lands.
License
MIT — see LICENSE. See also CONTRIBUTORS.md.
Acknowledgements
- @danveloper — for building the thing the hard way, writing it up, and publishing everything openly. moeflux started as flash-moe and git history reflects that.
- Claude Opus 4.6 — for the Metal streaming-experts kernels and the architecture that made all of this run.
- Anthropic — for making Claude available to do work like this in the first place.