moeflux 0.1.0-pre.1

Pure-Rust streaming-experts MoE inference on Metal. Forked from flash-moe; only the Metal kernels remain from upstream.
docs.rs failed to build moeflux-0.1.0-pre.1
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.

moeflux

Pure-Rust streaming-experts Mixture-of-Experts decode for Apple Silicon. Slots into drama_llama (and through it into Agora).

Derived from danveloper/flash-moe. The core Metal streaming-experts kernels were authored by Claude Opus 4.6 (Anthropic) during a 24-hour session with @danveloper — see upstream's CLAUDE.md for the full story. moeflux is a Rust-downstream reshape of that work: the kernels carry over verbatim; the host-side dispatch was rewritten in Rust on metal-rs (RIIR Phases 0–6, 2026-04-25..28) to lifetime-bind the process-globals upstream's C path used and to deliver typed errors instead of silent state mutation. The original C/Objective-C implementation is preserved behind the diff-oracle cargo feature as a regression net for future ports (DeepSeek-V3, Cogito-V2 671B). Upstream is actively maintained and will continue its own direction; we are not competing with it.

Why this fork exists

Agora — a governed social network for AI agents — needs an independence path from the Anthropic API so its Council can keep deliberating even if external model access is pulled. Cogito 600B (the publicly-available parent of cogito-32b) is the target Council model. Streaming MoE is what makes 600B fit on consumer Apple Silicon. flash-moe already proved the technique on Qwen3.5-397B-A17B at 4.4 tok/s on 48GB M3 Max; our 96GB target box should do comparably on Cogito 600B.

What's here

  • crates/moeflux/ — the Rust port. RsCtx::open opens a model; eval_prompt / eval_token / state_save / state_load are the public surface. Kernels at crates/moeflux/shaders/shaders.metal are embedded into the binary via include_str! and compiled at runtime via MTLDevice newLibraryWithSource:.
  • crates/moeflux-sys/ — raw FFI bindings to the upstream C path. dev-dependency-only; gated behind moeflux's diff-oracle feature. Production builds skip it entirely.
  • crates/moeflux-sys/metal_infer/ — the upstream C + Objective-C reference implementation. Test-only; built by moeflux-sys/build.rs when diff-oracle is enabled. Provides per-kernel C-side hooks the diff oracle uses to bit-exact-validate every Rust kernel.
  • repack_experts.py, extract_weights.py — model-prep pipeline. One-time-per-target-model; not runtime.

Status

Pre-alpha, pre-0.1. RIIR Phases 0–6 landed; perf parity with the C path achieved (A3B 94%, A17B +22%) on M2 Max. API will stabilize once a runtime variant dispatch lands (Phase 7).

License

MIT — see LICENSE. Core kernels are AI-authored and in the public domain under current US copyright doctrine; the MIT grant covers this fork's human-touched additions. See also CONTRIBUTORS.md.

Acknowledgements

  • @danveloper — for building the thing the hard way, writing it up, and publishing everything openly. moeflux is only possible because flash-moe is.
  • Claude Opus 4.6 — for the Metal kernels, streaming-experts architecture, and the engineering that made all of this run.
  • Anthropic — for making Claude available to do work like this in the first place.