Skip to main content

Module model_executor

Module model_executor 

Source
Expand description

Model execution interface with clear prefill/decode separation

This module provides the ModelExecutor trait that replaces the “fat” Model interface, focusing purely on tensor operations without tokenization or sampling.

Structs§

DecodeInput
Input for decode phase (generating one token at a time)
DecodeOutput
Output from decode phase
ExecutorAttentionConfig
Runtime attention configuration for model executor
ExecutorCapabilities
Executor capabilities and configuration
ExecutorConfig
Executor configuration
ExecutorMemoryConfig
Memory configuration for executor
ExecutorMemoryUsage
Executor memory usage
ExecutorMetrics
Executor performance metrics
ExecutorStatus
Executor status information
MemoryRequirements
Memory requirements for model execution
OptimizationConfig
Optimization configuration
PrefillInput
Input for prefill phase (processing the initial prompt)
PrefillOutput
Output from prefill phase
SpeculativeDecodeOutput
Output from speculative decoding

Enums§

AttentionType
Attention mechanism types
ExecutorState
Executor state
ExecutorType
Supported executor types

Traits§

BatchModelExecutor
Batch model executor for processing multiple requests efficiently
ExecutorRegistry
Executor registry for managing multiple executors
ModelExecutor
Core model executor trait focusing on tensor operations
ModelExecutorFactory
Model executor factory
SpeculativeExecutor
Speculative execution support