Expand description
NVIDIA Nemotron 3 Nano runner.
Nemotron ships as several GGUF arch tags:
nemotron— text-only, Llama-shaped attention stack; runs via therlx_llama32::Llama32Runnerdelegate below.nemotron_h/nemotron_h_moe— hybrid Mamba2 + attention; theNemotronHybridRunnerinrunner.rsdrives it via per-layerMamba2StepStageinterleaved with stateless attention blocks.
The Omni 30B variant (vision + audio) lives in rlx-nemotron-omni
and is wired independently.
Re-exports§
pub use config::NemotronHybridConfig;pub use config::NemotronLayerKind;pub use flow::mamba2_decode_layer_plugin_with_sink;pub use flow::stateless_attention_layer_plugin;pub use runner::NemotronHybridRunner;pub use runner::NemotronHybridRunnerBuilder;
Modules§
- config
- Nemotron-H hybrid Mamba+attention configuration.
- flow
- Nemotron-H per-layer decode-step blocks.
- runner
- Nemotron-H hybrid runner — single-step decode with Mamba2 state.
Structs§
Constants§
Functions§
Type Aliases§
- Llama32
Config Source - Where to load the Llama 3.2 config from.