Module model_executor

Expand description

Model execution interface with clear prefill/decode separation

This module provides the ModelExecutor trait that replaces the “fat” Model interface, focusing purely on tensor operations without tokenization or sampling.

Structs§

DecodeInput: Input for decode phase (generating one token at a time)
DecodeOutput: Output from decode phase
ExecutorAttentionConfig: Runtime attention configuration for model executor
ExecutorCapabilities: Executor capabilities and configuration
ExecutorConfig: Executor configuration
ExecutorMemoryConfig: Memory configuration for executor
ExecutorMemoryUsage: Executor memory usage
ExecutorMetrics: Executor performance metrics
ExecutorStatus: Executor status information
MemoryRequirements: Memory requirements for model execution
OptimizationConfig: Optimization configuration
PrefillInput: Input for prefill phase (processing the initial prompt)
PrefillOutput: Output from prefill phase
SpeculativeDecodeOutput: Output from speculative decoding

Enums§

AttentionType: Attention mechanism types
ExecutorState: Executor state
ExecutorType: Supported executor types

Traits§

BatchModelExecutor: Batch model executor for processing multiple requests efficiently
ExecutorRegistry: Executor registry for managing multiple executors
ModelExecutor: Core model executor trait focusing on tensor operations
ModelExecutorFactory: Model executor factory
SpeculativeExecutor: Speculative execution support

Module model_executor

Module model_executor Copy item path

Structs§

Enums§

Traits§

Module model_executor