Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
moeflux
Pure-Rust streaming-experts Mixture-of-Experts decode for Apple
Silicon. Slots into drama_llama
(and through it into Agora).
Derived from
danveloper/flash-moe.
The core Metal streaming-experts kernels were authored by Claude
Opus 4.6 (Anthropic) during a 24-hour session with @danveloper —
see upstream's CLAUDE.md for the full story. moeflux is a
Rust-downstream reshape of that work: the kernels carry over
verbatim; the host-side dispatch was rewritten in Rust on
metal-rs (RIIR Phases 0–6, 2026-04-25..28) to lifetime-bind the
process-globals upstream's C path used and to deliver typed errors
instead of silent state mutation. The original C/Objective-C
implementation is preserved behind the diff-oracle cargo feature
as a regression net for future ports (DeepSeek-V3, Cogito-V2 671B).
Upstream is actively maintained and will continue its own direction;
we are not competing with it.
Why this fork exists
Agora — a governed social
network for AI agents — needs an independence path from the
Anthropic API so its Council can keep deliberating even if external
model access is pulled. Cogito 600B (the publicly-available parent
of cogito-32b) is the target Council model. Streaming MoE is
what makes 600B fit on consumer Apple Silicon. flash-moe already
proved the technique on Qwen3.5-397B-A17B at 4.4 tok/s on 48GB M3
Max; our 96GB target box should do comparably on Cogito 600B.
What's here
crates/moeflux/— the Rust port.RsCtx::openopens a model;eval_prompt/eval_token/state_save/state_loadare the public surface. Kernels atcrates/moeflux/shaders/shaders.metalare embedded into the binary viainclude_str!and compiled at runtime viaMTLDevice newLibraryWithSource:.crates/moeflux-sys/— raw FFI bindings to the upstream C path.dev-dependency-only; gated behind moeflux'sdiff-oraclefeature. Production builds skip it entirely.crates/moeflux-sys/metal_infer/— the upstream C + Objective-C reference implementation. Test-only; built bymoeflux-sys/build.rswhendiff-oracleis enabled. Provides per-kernel C-side hooks the diff oracle uses to bit-exact-validate every Rust kernel.repack_experts.py,extract_weights.py— model-prep pipeline. One-time-per-target-model; not runtime.
Status
Pre-alpha, pre-0.1. RIIR Phases 0–6 landed; perf parity with
the C path achieved (A3B 94%, A17B +22%) on M2 Max. API will
stabilize once a runtime variant dispatch lands (Phase 7).
License
MIT — see LICENSE. Core kernels are AI-authored and
in the public domain under current US copyright doctrine; the
MIT grant covers this fork's human-touched additions. See also
CONTRIBUTORS.md.
Acknowledgements
- @danveloper — for building the thing the hard way, writing it up, and publishing everything openly. moeflux is only possible because flash-moe is.
- Claude Opus 4.6 — for the Metal kernels, streaming-experts architecture, and the engineering that made all of this run.
- Anthropic — for making Claude available to do work like this in the first place.