Skip to main content

Crate rlx_nemotron

Crate rlx_nemotron

Expand description

NVIDIA Nemotron 3 Nano runner.

Nemotron ships as several GGUF arch tags:

nemotron — text-only, Llama-shaped attention stack; runs via the rlx_llama32::Llama32Runner delegate below.
nemotron_h / nemotron_h_moe — hybrid Mamba2 + attention; the NemotronHybridRunner in runner.rs drives it via per-layer Mamba2StepStage interleaved with stateless attention blocks.

The Omni 30B variant (vision + audio) lives in rlx-nemotron-omni and is wired independently.

Re-exports§

pub use config::NemotronHybridConfig;
pub use config::NemotronLayerKind;
pub use flow::mamba2_decode_layer_plugin_with_sink;
pub use flow::stateless_attention_layer_plugin;
pub use runner::NemotronHybridRunner;
pub use runner::NemotronHybridRunnerBuilder;

Modules§

config: Nemotron-H hybrid Mamba+attention configuration.
flow: Nemotron-H per-layer decode-step blocks.
runner: Nemotron-H hybrid runner — single-step decode with Mamba2 state.

Structs§

Llama32Runner
Llama32RunnerBuilder
NemotronRunner
NemotronRunnerBuilder

Constants§

FAMILY
PLAN_MILESTONE

Functions§

cli_run

Type Aliases§

Llama32ConfigSource: Where to load the Llama 3.2 config from.